This text covers the technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical concerns. It assumes some mathematical background on the part of the reader, but the chapters typically begin with a non-mathematical account of the key issues. Current research topics are covered only to the extent that they are informing current applications; detailed coverage of longer term research and more theoretical treatments should be sought elsewhere. There are many pointers at the ends of the chapters that the reader can follow to explore the literature. However, the book does maintain a strong emphasis on evaluation in every chapter both in terms of methodology and the results of controlled experimentation.
2025. Strategizing AI utilization for psychological literature screening: A comparative analysis of machine learning algorithms and key factors to consider. Research Synthesis Methods► pp. 1 ff.
Santos, Vinicius
2025. Inside the Fog: Assessing Foreign Policy Expertise in the Brazilian Federal Senate. Dados 68:3
2025. Automatic Complaints Classification in E-Commerce: A Case Study Using CRISP-DM. Journal on Interactive Systems 16:1 ► pp. 256 ff.
Campos, Diego G., Tim Fütterer, Thomas Gfrörer, Rosa Lavelle-Hill, Kou Murayama, Lars König, Martin Hecht, Steffen Zitzmann & Ronny Scherer
2024. Screening Smarter, Not Harder: A Comparative Analysis of Machine Learning Screening Algorithms and Heuristic Stopping Criteria for Systematic Reviews in Educational Research. Educational Psychology Review 36:1
2024. 2024 IEEE International Conference on Digital Health (ICDH), ► pp. 114 ff.
Tandon, Archana, Bireshwar Dass Mazumdar & Manoj Kumar Pal
2024. Integrated Intelligent Computing Models for Cognitive-Based Neurological Disease Interpretation in Children: A Survey. EAI Endorsed Transactions on Pervasive Health and Technology 10
2023. 2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC), ► pp. 47 ff.
Melhem, Mohammed K. Bani, Laith Abualigah, Raed Abu Zitar, Abdelazim G. Hussien & Diego Oliva
2023. Comparative Study on Arabic Text Classification: Challenges and Opportunities. In Classification Applications with Deep Learning and Machine Learning Technologies [Studies in Computational Intelligence, 1071], ► pp. 217 ff.
Romanov, Dmitry, Valentin Molokanov, Nikolai Kazantsev & Ashish Kumar Jha
2023. Removing order effects from human-classified datasets: A machine learning method to improve decision making systems. Decision Support Systems 165 ► pp. 113891 ff.
Vollero, Agostino, Domenico Sardanelli & Alfonso Siano
2023. Exploring the role of the Amazon effect on customer expectations: An analysis of user‐generated content in consumer electronics retailing. Journal of Consumer Behaviour 22:5 ► pp. 1062 ff.
Correa, Nelson & Antonio Correa
2022. 2022 IEEE ANDESCON, ► pp. 1 ff.
Nundloll, Vatsala, Robert Smail, Carly Stevens & Gordon Blair
2022. Automating the extraction of information from a historical text and building a linked data model for the domain of ecology and conservation science. Heliyon 8:10 ► pp. e10710 ff.
Tikhonova, Olga, Aleksandr Khrulkov, Aleksandr Antonov, Stanislav L. Sobolevsky & Sergey A. Mityagin
2022. Extraction of hidden topics in urban context based on the Internet publications analysis. Procedia Computer Science 212 ► pp. 23 ff.
2021. Toward Intelligent Solution to Identify Learner Attitude from Source Code. In Artificial Intelligence and Industrial Applications [Lecture Notes in Networks and Systems, 144], ► pp. 110 ff.
Sánchez-Cervantes, José Luis, Giner Alor-Hernández, Mario Andrés Paredes-Valverde, Lisbeth Rodríguez-Mazahua & Rafael Valencia-García
2021. NaLa-Search: A multimodal, interaction-based architecture for faceted search on linked open data. Journal of Information Science 47:6 ► pp. 753 ff.
Baraibar-Diez, Elisa, Manuel Luna, María D. Odriozola & Ignacio Llorente
2020. Mapping Social Impact: A Bibliometric Analysis. Sustainability 12:22 ► pp. 9389 ff.
Chantar, Hamouda, Majdi Mafarja, Hamad Alsawalqah, Ali Asghar Heidari, Ibrahim Aljarah & Hossam Faris
2020. Feature selection using binary grey wolf optimizer with elite-based crossover for Arabic text classification. Neural Computing and Applications 32:16 ► pp. 12201 ff.
Lunn, Stephanie, Jia Zhu & Monique Ross
2020. 2020 IEEE Frontiers in Education Conference (FIE), ► pp. 1 ff.
Pérez-Soler, Sara, Gwendal Daniel, Jordi Cabot, Esther Guerra & Juan de Lara
2020. Towards Automating the Synthesis of Chatbots for Conversational Model Query. In Enterprise, Business-Process and Information Systems Modeling [Lecture Notes in Business Information Processing, 387], ► pp. 257 ff.
Soni, Mukesh, S. Gomathi & Yagna Bhupendra Kumar Adhyaru
2020. 2020 7th International Conference on Smart Structures and Systems (ICSSS), ► pp. 1 ff.
Talukder, Md Ashraful Islam, Sheikh Abujar, Abu Kaisar Mohammad Masum, Sharmin Akter & Syed Akhter Hossain
2020. 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), ► pp. 1 ff.
Vollero, Agostino, Alfonso Siano & Domenico Sardanelli
2020. Amazon Effect? an Analysis of User-Generated Content on Consumer Electronics Retailers’ Facebook Pages. In Advances in Digital Marketing and eCommerce [Springer Proceedings in Business and Economics, ], ► pp. 188 ff.
Aboalnaser, Sara A.
2019. 2019 12th International Conference on Developments in eSystems Engineering (DeSE), ► pp. 290 ff.
2019. Multi-platform Chatbot Modeling and Deployment with the Jarvis Framework. In Advanced Information Systems Engineering [Lecture Notes in Computer Science, 11483], ► pp. 177 ff.
2019. Artificial Intelligence and Machine Learning in Bioinformatics. In Encyclopedia of Bioinformatics and Computational Biology, ► pp. 272 ff.
Cahill, Maria, Soohyung Joo & Kathleen Campana
2018. Language investigations of children's information sources: A research agenda. Proceedings of the Association for Information Science and Technology 55:1 ► pp. 56 ff.
Cahill, Maria, Soohyung Joo & Kathleen Campana
2020. Analysis of language use in public library storytimes. Journal of Librarianship and Information Science 52:2 ► pp. 476 ff.
de Sá, Carlos Augusto & Raimundo Santos Moura
2018. Anais do VII Brazilian Workshop on Social Network Analysis and Mining (BraSNAM 2018), ► pp. 109 ff.
Kejriwal, Mayank, Daniel Gilley, Pedro Szekely & Jill Crisman
2018. Companion of the The Web Conference 2018 on The Web Conference 2018 - WWW '18, ► pp. 147 ff.
Zhao, Qianqian, Kai Chen, Tongxin Li, Yi Yang & XiaoFeng Wang
2018. Detecting telecommunication fraud by understanding the contents of a call. Cybersecurity 1:1
2015. Using Twitter Data and Sentiment Analysis to Study Diseases Dynamics. In Information Technology in Bio- and Medical Informatics [Lecture Notes in Computer Science, 9267], ► pp. 16 ff.
Gibert, Marcin
2015. Improving Information-Carrying Data Capacity in Text Mining. In Computational Collective Intelligence [Lecture Notes in Computer Science, 9330], ► pp. 648 ff.
Kusumadewi, Sri, Chanifah Indah Ratnasari & Linda Rosita
2015. 2015 International Conference on Science and Technology (TICST), ► pp. 292 ff.
Rebelo, Francisco, Carlos Soares & Rosaldo J. F. Rossetti
2015. 2015 IEEE First International Smart Cities Conference (ISC2), ► pp. 1 ff.
Stanković, Ranka, Cvetana Krstev, Ivan Obradović & Olivera Kitanović
2015. Indexing of Textual Databases Based on Lexical Resources: A Case Study for Serbian. In Semantic Keyword-Based Search on Structured Data Sources [Lecture Notes in Computer Science, 9398], ► pp. 167 ff.
Stanković, Ranka, Cvetana Krstev, Ivan Obradović & Olivera Kitanović
2017. Improving Document Retrieval in Large Domain Specific Textual Databases Using Lexical Resources. In Transactions on Computational Collective Intelligence XXVI [Lecture Notes in Computer Science, 10190], ► pp. 162 ff.
Huijnen, Pim, Fons Laan, Maarten de Rijke & Toine Pieters
2014. A Digital Humanities Approach to the History of Science. In Social Informatics [Lecture Notes in Computer Science, 8359], ► pp. 71 ff.
More, Joaquim, David Baneres, Jordi Conesa & Montse Junyent
2014. 2014 International Conference on Intelligent Networking and Collaborative Systems, ► pp. 480 ff.
Thessen, Anne E., Cynthia Sims Parr & Luis M. Rocha
2014. Knowledge Extraction and Semantic Annotation of Text from the Encyclopedia of Life. PLoS ONE 9:3 ► pp. e89550 ff.
2013. A Topic Recognition System for Real World Human-Robot Conversations. In Intelligent Autonomous Systems 12 [Advances in Intelligent Systems and Computing, 194], ► pp. 383 ff.
Banchs, Rafael E. & Carlos G. Rodríguez Penagos
2013. Mining User-Generated Content for Social Research and Other Applications. In Emerging Applications of Natural Language Processing, ► pp. 230 ff.
Banchs, Rafael E. & Carlos G. Rodríguez Penagos
2013. Mining User-Generated Content for Social Research and Other Applications. In Small and Medium Enterprises, ► pp. 1945 ff.
Bobicev, Victoria, Marina Sokolova, Khaled El Emam, Yasser Jafer, Brian Dewar, Elizabeth Jonker & Stan Matwin
2013. Can Anonymous Posters on Medical Forums be Reidentified?. Journal of Medical Internet Research 15:10 ► pp. e215 ff.
Cheng, Li & Alei Liang
2013. Proceedings of 2013 3rd International Conference on Computer Science and Network Technology, ► pp. 174 ff.
Yoon, Sunmoo, Noémie Elhadad & Suzanne Bakken
2013. A Practical Approach for Content Mining of Tweets. American Journal of Preventive Medicine 45:1 ► pp. 122 ff.
2012. Towards Partners Profiling in Human Robot Interaction Contexts. In Simulation, Modeling, and Programming for Autonomous Robots [Lecture Notes in Computer Science, 7628], ► pp. 4 ff.
Carvalho, Joao P., Fernando Batista & Luisa Coheur
2012. 2012 IEEE International Conference on Fuzzy Systems, ► pp. 1 ff.
2013. Detecting Pharmaceutical Spam in Microblog Messages. In Data Mining, ► pp. 1407 ff.
Blackburn, Timothy D., Thomas A. Mazzuchi & Shahram Sarkani
2011. Overcoming Inherent Limits to Pharmaceutical Manufacturing Quality Performance with QbD (Quality by Design). Journal of Pharmaceutical Innovation 6:2 ► pp. 69 ff.
Gardoň, Andrej & Aleš Horák
2011. Time Dimension in the Dolphin Nick Knowledge Base Using Transparent Intensional Logic. In Text, Speech and Dialogue [Lecture Notes in Computer Science, 6836], ► pp. 323 ff.
Kang, Jingjing, Tao Liu, He Hu & Xiaoyong Du
2011. 2011 Sixth Annual Chinagrid Conference, ► pp. 60 ff.
Kannan, Rajkumar, Maria Bielikova, Frederic Andres & S. R. Balasundaram
2011. Proceedings of the Fourth Annual ACM Bangalore Conference, ► pp. 1 ff.
Küçük, Dilek & Adnan Yazıcı
2011. Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos. Knowledge-Based Systems 24:6 ► pp. 844 ff.
O’Shea, James, Zuhair Bandar & Keeley Crockett
2011. Systems Engineering and Conversational Agents. In Intelligence-Based Systems Engineering [Intelligent Systems Reference Library, 10], ► pp. 201 ff.
Bonino, Dario, Alberto Ciaramella & Fulvio Corno
2010. Review of the state-of-the-art in patent information and forthcoming evolutions in intelligent patent informatics. World Patent Information 32:1 ► pp. 30 ff.
Ashley, Kevin D. & Stefanie Brüninghaus
2009. Automatically classifying case texts and predicting outcomes. Artificial Intelligence and Law 17:2 ► pp. 125 ff.
Canan Pembe, F. & Tunga Güngör
2009. Structure‐preserving and query‐biased document summarisation for web searching. Online Information Review 33:4 ► pp. 696 ff.
Geist, Anton
2009. Using Citation Analysis Techniques for Computer-Assisted Legal Research in Continental Jurisdictions. SSRN Electronic Journal
Oleshchuk, Vladimir & Vitaly Klyuev
2009. 2009 IEEE International Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, ► pp. 561 ff.
Cohen, K. Bretonnel & Lawrence Hunter
2008. Getting Started in Text Mining. PLoS Computational Biology 4:1 ► pp. e20 ff.
Kucuk, Dilek & Adnan Yazici
2008. 2008 23rd International Symposium on Computer and Information Sciences, ► pp. 1 ff.
Seki, Kazuhiro & Javed Mostafa
2008. Gene ontology annotation as text categorization: An empirical study. Information Processing & Management 44:5 ► pp. 1754 ff.
[no author supplied]
2011. Bibliography. In Data Mining, ► pp. 510 ff.
[no author supplied]
2019. BIBLIOGRAPHY. In Data Mining, ► pp. 607 ff.
This list is based on CrossRef data as of 6 march 2026. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.