Automatic Term Extraction

Heylen, Kris; De Hertog, Dirk

doi:10.1075/hot.1.aut1

In:Handbook of Terminology: Volume 1
Edited by Hendrik J. Kockaert and Frieda Steurs
[Handbook of Terminology 1] 2015
► pp. 203–221

Get fulltext from our e-platform

Download Book PDF

Download Book EPUB

Automatic Term Extraction

Kris Heylen | KU Leuven

Dirk De Hertog | KU Leuven

Published online: 13 March 2015

https://doi.org/10.1075/hot.1.aut1

Article outline

1.Introduction
2.Corpus collection
3.Unithood
- 3.1 Introduction
- 3.2 Linguistic approaches
- 3.3Statistical approaches
  - 3.3.1 Collocation measures
  - 3.3.2 Paradigmatic modifiability
  - 3.3.3 Lexical bundles
4.Termhood
- 4.1Introduction
- 4.2 Distributional approach: TF-IDF
- 4.3A contextual approach to TH: C/NC value
- 4.4 Morphological approaches
- 4.5 Contrastive approaches to TH
5.Term variation
6.Evaluation and validation
7.Conclusion
Notes
References

References (60)

References

Ahmad, Khurshid, Lee Gillam, and Lena Tostevin. 1999. “Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER).” In The 8th Text Retrieval Conference, edited by Ellen Voorhees and Donna Harman, 717-724. Washington: National Institute of Standards and Technology.

Ananiadou, Sophia. 1994. “A methodology for automatic term recognition.” In Proceedings of the 15th conference on Computational linguistics (COLING’94), 1034-1038. Kyoto, Japan.

Assadi, Houssem and Didier Bourigault. 1996. “Acquisition et modélisation des connaissances à partir de textes: outils informatiques et éléments méthodologiques.” In Actes du 10ème congrès Reconnaissance des Formes et Intelligence Artificielle, 505-514. Rennes: Association Française pour la Cybernétique Economique et Technique.

Aubin, Sophie and Thierry Hamon. 2006. “Improving term extraction with terminological resources.” In Proceedings of the 5th international conference on Advances in Natural Language Processing, edited by Tapio Salakoski, Filip Ginter, Sampo Pyysalo and Tapio Pahikkala, 380-387. Berlin/Heidelberg: Springer-Verlag.

Baroni, Marco and Silvia Bernardini. 2004. “BootCaT: Bootstrapping Corpora and Terms from the Web.” In Proceedings of the Fourth International Conference On Language Resources And Evaluation, edited by Maria Teresa Lino et al., 1313-1316. Lisbon, Portugal: European Language Resources Association.

Basili, Roberto, Alessandro Moschitti, Fabio Massimo Zanzotto, Maria Teresa Pazienza, and Nicolas Nicolov and Ruslan Mitkov. 2001. “Modelling Syntactic Context in Automatic Term Extraction.” In Proceedings of Recent Advances in Natural Language Processing, edited by 28-34. Amsterdam/Philadelphia: John Benjamins.

Biber, Douglas. 1993. “Representativeness in Corpus Design.” Literary and Linguistic Computing 8(4):243-257.

Biber, Douglas and Susan Conrad. 1999. “Lexical bundles in conversation and academic prose.” Language and Computers 26:181-190.

Bourigault, Didier. 1992. “Surface grammatical analysis for the extraction of terminological noun phrases.” In Proceedings of 14th International Conference on Computational Linguistics, edited by Christian Boitet, 977-981. Stroudsburg, PA, USA: Association for Computational Linguistics.

Bourigault, Didier and Christian Jacquemin. 1999. “Term extraction + term clustering: An integrated platform for computer-aided terminology.” In Proceedings of the ninth conference on European Chapter of the Association for Computational Linguistics (EACL), Bergen, 15-22. Stroudsburg, PA, USA: Association for Computational Linguistics.

Cabré Castellví, M. Teresa, Rosa Estopà, and Jordi Vivaldi. 2001. “Automatic term detection: a review of current systems.” In Recent Advances in Computational Terminology, edited by Didier Bourigault, Christian Jacquemin and Marie-Claude L’Homme, 53-88. Natural Language Processing, vol. 2. Amsterdam: John Benjamins Publishing Company. TSB

Chung, Teresa Mihwa. 2003. “A corpus comparison approach for terminology extraction.” Terminology 9(26):221-246.

Church, Kenneth and Patrick Hanks. 1990. “Word association norms, mutual information, and lexicography.” Computational Linguistics 16(1):22-29.

Da Silva, Joaquim, Gaël Dias, Sylvie Guilloré, and José Pereira Lopes. 1999. “Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units.” In Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, edited by Pedro Barahona and José Júlio Alferes, 113-132. London, UK: Springer-Verlag.

Daille, Béatrice. 1994. “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Workshop at the 32nd Annual Meeting of the Association for Computational Linguistics, 29-36. Stroudsburg, PA, USA: Association for Computational Linguistics.

. 1996. “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language, edited by Philip Resnik and Judith L. Klavans, 49-66. Cambridge, MA, USA: MIT Press.

. 2005. “Variations and application-oriented terminology engineering.” Terminology 11(1):181-197. TSB

Daille, Béatrice, Eric Gaussier, and Jean-Marc Langé. 1994. “Towards automatic extraction of monolingual and bilingual terminology.” In Proceedings of the 15th International Conference on Computational Linguistics, 515-521. Stroudsburg, PA, USA: Association for Computational Linguistics.

Drouin, Patrick. 2003. “Term extraction using non-technical corpora as a point of leverage.” Terminology 9(1):99-115. TSB

. 2006. “Termhood: Quantifying the Relevance of a Candidate Term.” Linguistic Insights. Studies in Language and Communication 36:375-391.

Drouin, Patrick and Frédéric Doll. 2008. “Quantifying Termhood Through Corpus Comparison”, In Terminology and Knowledge Engineering (TKE-2008), 191-206. Copenhagen, Denmark: Copenhagen Business School.

Dunning, Ted. 1993. “Accurate methods for the statistics of surprise and coincidence.” Computational Linguistics 19(1):61-74.

Evans, David, Natasa Milic-Frayling, and Robert Lefferts. 1995. “Clarit TREC-4 Experiments.” In NIST Special Publication 500-236, edited by Donna Harman, 305-322.

Evert, Stefan. 2004. “The Statistics of Word Cooccurrences: Word Pairs and Collocations.” PhD diss., University of Stuttgart.

Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima. 2000. “Automatic recognition of multi-word terms: The C-value/NC-value method.” International Journal on Digital Libraries 3(2):115-130.

Foo, Jody. 2012. “Computational Terminology: Exploring Bilingual and Monolingual Term Extraction.” PhD diss., Linköping University.

Foo, Jody and Magnus Merkel. (2010). “Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools.” In Terminology in Everyday Life, edited by Marcel Thelen and Frieda Steurs, 163-180. New York: John Benjamins.

Groc, Clément de. 2011. “Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction.” In Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, edited by Olivier Boissier, Boualem Benatallah, Mike P. Papazoglou, Zbigniew W. Ras and Mohand-Said Hacid, 497-498. IEEE Computer Society.

Justeson, John S. and Slava M. Katz. 1995. “Technical terminology: some linguistic properties and an algorithm for identification in text”. Natural Language Engineering 1(1):9-27.

Kageura, Kyo. 2009. “Computing the potential lexical productivity of head elements in nominal compounds using the textual corpus”. Progress in Informatics, (6):49-56.

Kageura, Kyo and Umino, Bin 1996. “Methods of automatic term recognition: a review”. Terminology 3(2):259-289. TSB

Kit, Chunyu. 2002. “Corpus tools for retrieving and deriving termhood evidence.” In 5th East Asia Forum of Terminology, 69-80. Haikou, China.

Kit, Chunyu and Xiauyue Lui. 2008. “Measuring mono-word termhood by rank difference via corpus comparison.” Terminology 14(2):204-229.

Korkontzelos, Ioannis, Ioannis Klapaftis, and Suresh Manandhar. 2008. “Reviewing and Evaluating Automatic Term Recognition Techniques.” In Proceedings of the 6th International Conference on Natural Language Processing, edited by Bengt Nordström and Aarne Ranta, 248-259. Berlin/Heidelberg, Germany: Springer.

Liu, Xiaoyue and Chunyu Kit. 2009. “Statistical termhood measurement for mono-word terms via corpus comparison.” In Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, 3499-3504. IEEE Computer Society.

Manning, Christopher and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press.

Matsuo, Yutaka and Mitsuru Ishizuka. 2004. “Keyword extraction from a single document using word co-occurrence statistical information.” International Journal on Artificial Intelligence Tools 13(1):157-169.

Maynard, Diana and Sophia Ananiadou. 1999. “Identifying Contextual Information for Multi-Word Term Extraction.” In Proceedings of the TKE ‘99 International Congress on Terminology and Knowledge Engineering, edited by Peter Sandrini, 212-221. Vienna, Austria: TermNet.

McEnery, Tony, Richard Xiao, and Yukio Tono, editors. 2006. Corpus-based Language Studies: An Advanced Resource Book. London, UK: Routledge.

Medelyan, Olena and Ian H. Witten. 2006. “Thesaurus based automatic keyphrase indexing.” In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, edited by Gary Marchionini, Michael L. Nelson and Catherine C. Marshall, 296-297. New York, USA: Association for Computer Machinery.

Nakagawa, Hiroshi. 2000. “Automatic Term Recognition based on Statistics of Compound Nouns.” Terminology 6(2):195-210. TSB

Nakagawa, Hiroshi and Tatsunori Mori. 1998. “Nested collocation and compound noun for term recognition.” InProceedings of the First Workshop on Computational Terminology, edited by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 64-70. Montreal, Canada: Université de Montréal.

. 2002. “A simple but powerful automatic term extraction method.” In Proceedings of the Second International Workshop on Computational Terminology, 1-7. Stroudsburg, PA, USA: Association for Computational Linguistics.

Nenadic, Goran, Sophia Ananiadou, and John McNaught. 2004. “Enhancing automatic term recognition through recognition of variation.” In Proceedings of the 20th international Conference on Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics.

Pantel, Patrick and Lin, Dekang. 2001. “A Statistical Corpus-Based Term Extractor”. In Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of intelligence: Advances in Artificial intelligence, edited by Eleni Stroulia and Stan Matwin, 36-46. Lecture Notes In Computer Science, vol. 2056. London: Springer-Verlag.

Pazienza, Maria Teresa, Marco Pennacchiotti, and Fabio Massimo Zanzotto. 2005. “Terminology extraction: an analysis of linguistic and statistical approaches.” In Knowledge Mining, edited by Spiros Sirmakessis. Series: Studies in Fuzziness and Soft Computing, Vol.185. Springer-Verlag.

Pecina, Pavel and Pavel Schlesinger. 2006. “Combining association measures for collocation extraction.” In Proceedings of the COLING/ACL on Main Conference Poster Sessions Annual Meeting of the ACL, 651-658. Morristown, NJ: Association for Computational Linguistics.

Rizzo, Camino R. 2010. “Getting on with corpus compilation: from theory to practice.” English for Specific Purposes World, Issue 1(27), vol. 9. [URL].

Sager, Juan C. 1978. Commentary by Prof. Juan Carlos Sager. In Actes Table Ronde sur les Problèmes du Découpage du Terme, edited by G. Rondeau, 39-74. Montréal: Commission de Terminologie de l’AILA.

Salton, Gerard, Andrew Wong, and Chung-Su Yang. 1975. “A vector space model for automatic indexing.” Communications of the ACM 18:613-620.

Sclano, Francesco, Paola Velardi. 2007. “Termextractor: a web application to learn the common terminology of interest groups and research communities.” In Proceedings of the 7th Conference on Terminology and Artificial Intelligence (TIA-2007), Sophia Antipolis.

Scott, Mike. 1997. “The Right Word in the Right Place: Key Word Associates in Two Languages.” AAA - Arbeiten aus Anglistik und Amerikanistik, 22 (2):239-252.

Simpson-Vlach, Rita and Nick Ellis. 2010. “An Academic Formulas List: New Methods in Phraseology Research.” Applied Linguistics 31:487-512. BoP

Thurmair, Gregor. 2003. “Making Term Extraction Tools Usable.” In Proceedings of the Joint Conference of the 8th Workshop of the European Association for Machine Translation and the 4th Controlled Language Applications Workshop. Dublin: European Association for Machine Translation.

Vivaldi, Jordi and Horacio Rodriguez. 2007. “Evaluation of terms and term extraction systems - A practical approach.” Terminology 13(2):225-248. TSB

Vivaldi, Jordi, Lluis Màrquez, and Horacio Rodríguez. 2001. “Improving Term Extraction by System Combination Using Boosting.” In Machine Learning ECML 2001, edited by Luc de Raedt and Peter Flach, 515-526. Series: Lecture Notes in Computer Science, vol. 2167. Springer.

Wermter, Joachim and Udo Hahn. 2005. “Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms.” In Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing, 843-850. Association for Computational Linguistics.

Wiechmann, Daniel. 2008. “On the Computation of Collostruction Strength: Testing Measures of Association as Expressions of Lexical Bias.” Corpus Linguistics and Linguistic Theory 4 (2):253-290.

Wong, Wilson, Wei Liu, and Mohammed Bennamoun. 2007. “Determining termhood for learning domain ontologies using domain prevalence and tendency.” In Proceedings of the Sixth Australasian Conference on Data Mining and Analytics, edited by Peter Christen, Paul Kennedy, Jiuyong Li, Inna Kolyshkina and Graham Williams, 47-54. Australian Computer Society.

Zhang, Ziqi, José Iria, Christopher Brewster, and Fabio Ciravegna. 2008. “A Comparative Evaluation of Term Recognition Algorithms.” In Proceedings of the Sixth Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco.

Cited by (17)

Cited by 17 other publications

Order by:

Heinisch, Barbara

2025. Next-gen terminology: Transforming terminology work with large language models. Across Languages and Cultures 26:S ► pp. 64 ff.

Steurs, Frieda & Dirk Kinable

2025. Efforts and challenges in translating concept to reality. In Handbook of Terminology [Handbook of Terminology, 4], ► pp. 377 ff.

Wissik, Tanja

2025. Impact of automatic term extraction on terminology work. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 31:1 ► pp. 110 ff.

Das, Bidyut, Mukta Majumder, Santanu Phadikar & Arif Ahmed Sekh

2024. Biomedical term extraction using fuzzy association. Soft Computing 28:6 ► pp. 5699 ff.

Chiocchetti, Elena, Vesna Lušicky & Tanja Wissik

2023. Multilingual legal terminology databases. In Handbook of Terminology [Handbook of Terminology, 3], ► pp. 458 ff.

Geeraerts, Dirk, Dirk Speelman, Kris Heylen, Mariana Montes, Stefano De Pascale, Karlien Franco & Michael Lang

2023. Lexical Variation and Change,

Marín, María José

2023. Automatic term recognition and legal language. In Handbook of Terminology [Handbook of Terminology, 3], ► pp. 511 ff.

Barbero, Chiara

2022. CQL Grammars for Lexical and Semantic Information Extraction for Portuguese and Italian. In Computational Processing of the Portuguese Language [Lecture Notes in Computer Science, 13208], ► pp. 376 ff.

Bowker, Lynne

2022. Pivoting to support science communication in times of crisis. In Science Communication in Times of Crisis [Discourse Approaches to Politics, Society and Culture, 96], ► pp. 65 ff.

Vo, Chau, Tru Cao, Ngoc Truong, Trung Ngo & Dai Bui

2022. Automatic medical term extraction from Vietnamese clinical texts. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:2 ► pp. 299 ff.

Wu, Junfeng, Guangyan Huang & Roozbeh Zarei

2022. ETBTRank: Ranking Biterms in Paper Titles for Emerging Topic Discovery. In AI 2021: Advances in Artificial Intelligence [Lecture Notes in Computer Science, 13151], ► pp. 775 ff.

Hoste, Veronique, Klaar Vanopstal, Ayla Rigouts Terryn & Els Lefever

2019. The Trade-off between Quantity and Quality. Comparing a Large Crawled Corpus and a Small Focused Corpus for Medical Terminology Extraction. Across Languages and Cultures 20:2 ► pp. 197 ff.

Miyata, Rei & Kyo Kageura

2018. Building controlled bilingual terminologies for the municipal domain and evaluating them using a coverage estimation approach. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 24:2 ► pp. 149 ff.

Arora, Chetan, Mehrdad Sabetzadeh, Lionel Briand & Frank Zimmer

2017. Automated Extraction and Clustering of Requirements Glossary Terms. IEEE Transactions on Software Engineering 43:10 ► pp. 918 ff.

Oliver, Antoni

2017. A system for terminology extraction and translation equivalent detection in real time. Machine Translation 31:3 ► pp. 147 ff.

Bowker, Lynne & Tom Delsey

2016. Information science, terminology and translation Studies. In Border Crossings [Benjamins Translation Library, 126], ► pp. 73 ff.

Nugumanova, Aliya, Igor Bessmertny, Yerzhan Baiburin & Madina Mansurova

2016. A New Operationalization of Contrastive Term Extraction Approach Based on Recognition of Both Representative and Specific Terms. In Knowledge Engineering and Semantic Web [Communications in Computer and Information Science, 649], ► pp. 103 ff.

This list is based on CrossRef data as of 11 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.