In:Handbook of Terminology: Volume 1
Edited by Hendrik J. Kockaert and Frieda Steurs
[Handbook of Terminology 1] 2015
► pp. 203–221
Automatic Term Extraction
Published online: 13 March 2015
https://doi.org/10.1075/hot.1.aut1
https://doi.org/10.1075/hot.1.aut1
Article outline
- 1.Introduction
- 2.Corpus collection
- 3.Unithood
- 3.1 Introduction
- 3.2 Linguistic approaches
- 3.3Statistical approaches
- 3.3.1 Collocation measures
- 3.3.2 Paradigmatic modifiability
- 3.3.3 Lexical bundles
- 4.Termhood
- 4.1Introduction
- 4.2 Distributional approach: TF-IDF
- 4.3A contextual approach to TH: C/NC value
- 4.4 Morphological approaches
- 4.5 Contrastive approaches to TH
- 5.Term variation
- 6.Evaluation and validation
- 7.Conclusion
Notes References
References (60)
Ahmad, Khurshid, Lee Gillam, and Lena Tostevin. 1999. “Weirdness Indexing for Logical Document Extrapolation and Retrieval (WILDER).” In The 8th Text Retrieval Conference, edited by Ellen Voorhees and Donna Harman, 717-724. Washington: National Institute of Standards and Technology.
Ananiadou, Sophia. 1994. “A methodology for automatic term recognition.” In Proceedings of the 15th conference on Computational linguistics (COLING’94), 1034-1038. Kyoto, Japan.
Assadi, Houssem and Didier Bourigault. 1996. “Acquisition et modélisation des connaissances à partir de textes: outils informatiques et éléments méthodologiques.” In Actes du 10ème congrès Reconnaissance des Formes et Intelligence Artificielle, 505-514. Rennes: Association Française pour la Cybernétique Economique et Technique.
Aubin, Sophie and Thierry Hamon. 2006. “Improving term extraction with terminological resources.” In Proceedings of the 5th international conference on Advances in Natural Language Processing, edited by Tapio Salakoski, Filip Ginter, Sampo Pyysalo and Tapio Pahikkala, 380-387. Berlin/Heidelberg: Springer-Verlag.
Baroni, Marco and Silvia Bernardini. 2004. “BootCaT: Bootstrapping Corpora and Terms from the Web.” In Proceedings of the Fourth International Conference On Language Resources And Evaluation, edited by Maria Teresa Lino et al., 1313-1316. Lisbon, Portugal: European Language Resources Association.
Basili, Roberto, Alessandro Moschitti, Fabio Massimo Zanzotto, Maria Teresa Pazienza, and Nicolas Nicolov and Ruslan Mitkov. 2001. “Modelling Syntactic Context in Automatic Term Extraction.” In Proceedings of Recent Advances in Natural Language Processing, edited by 28-34. Amsterdam/Philadelphia: John Benjamins.
Biber, Douglas. 1993. “Representativeness in Corpus Design.” Literary and Linguistic Computing 8(4):243-257.
Biber, Douglas and Susan Conrad. 1999. “Lexical bundles in conversation and academic prose.” Language and Computers 26:181-190.
Bourigault, Didier. 1992. “Surface grammatical analysis for the extraction of terminological noun phrases.” In Proceedings of 14th International Conference on Computational Linguistics, edited by Christian Boitet, 977-981. Stroudsburg, PA, USA: Association for Computational Linguistics.
Bourigault, Didier and Christian Jacquemin. 1999. “Term extraction + term clustering: An integrated platform for computer-aided terminology.” In Proceedings of the ninth conference on European Chapter of the Association for Computational Linguistics (EACL), Bergen, 15-22. Stroudsburg, PA, USA: Association for Computational Linguistics.
Cabré Castellví, M. Teresa, Rosa Estopà, and Jordi Vivaldi. 2001. “Automatic term detection: a review of current systems.” In Recent Advances in Computational Terminology, edited by Didier Bourigault, Christian Jacquemin and Marie-Claude L’Homme, 53-88. Natural Language Processing, vol. 2. Amsterdam: John Benjamins Publishing Company. TSB
Chung, Teresa Mihwa. 2003. “A corpus comparison approach for terminology extraction.” Terminology 9(26):221-246.
Church, Kenneth and Patrick Hanks. 1990. “Word association norms, mutual information, and lexicography.” Computational Linguistics 16(1):22-29.
Da Silva, Joaquim, Gaël Dias, Sylvie Guilloré, and José Pereira Lopes. 1999. “Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units.” In Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, edited by Pedro Barahona and José Júlio Alferes, 113-132. London, UK: Springer-Verlag.
Daille, Béatrice. 1994. “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Workshop at the 32nd Annual Meeting of the Association for Computational Linguistics, 29-36. Stroudsburg, PA, USA: Association for Computational Linguistics.
. 1996. “Study and Implementation of Combined Techniques for Automatic Extraction of Terminology.” In The Balancing Act: Combining Symbolic and Statistical Approaches to Language, edited by Philip Resnik and Judith L. Klavans, 49-66. Cambridge, MA, USA: MIT Press.
. 2005. “Variations and application-oriented terminology engineering.” Terminology 11(1):181-197. TSB
Daille, Béatrice, Eric Gaussier, and Jean-Marc Langé. 1994. “Towards automatic extraction of monolingual and bilingual terminology.” In Proceedings of the 15th International Conference on Computational Linguistics, 515-521. Stroudsburg, PA, USA: Association for Computational Linguistics.
Drouin, Patrick. 2003. “Term extraction using non-technical corpora as a point of leverage.” Terminology 9(1):99-115. TSB
. 2006. “Termhood: Quantifying the Relevance of a Candidate Term.” Linguistic Insights. Studies in Language and Communication 36:375-391.
Drouin, Patrick and Frédéric Doll. 2008. “Quantifying Termhood Through Corpus Comparison”, In Terminology and Knowledge Engineering (TKE-2008), 191-206. Copenhagen, Denmark: Copenhagen Business School.
Dunning, Ted. 1993. “Accurate methods for the statistics of surprise and coincidence.” Computational Linguistics 19(1):61-74.
Evans, David, Natasa Milic-Frayling, and Robert Lefferts. 1995. “Clarit TREC-4 Experiments.” In NIST Special Publication 500-236, edited by Donna Harman, 305-322.
Evert, Stefan. 2004. “The Statistics of Word Cooccurrences: Word Pairs and Collocations.” PhD diss., University of Stuttgart.
Frantzi, Katerina, Sophia Ananiadou, and Hideki Mima. 2000. “Automatic recognition of multi-word terms: The C-value/NC-value method.” International Journal on Digital Libraries 3(2):115-130.
Foo, Jody. 2012. “Computational Terminology: Exploring Bilingual and Monolingual Term Extraction.” PhD diss., Linköping University.
Foo, Jody and Magnus Merkel. (2010). “Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools.” In Terminology in Everyday Life, edited by Marcel Thelen and Frieda Steurs, 163-180. New York: John Benjamins.
Groc, Clément de. 2011. “Babouk: Focused Web Crawling for Corpus Compilation and Automatic Terminology Extraction.” In Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, edited by Olivier Boissier, Boualem Benatallah, Mike P. Papazoglou, Zbigniew W. Ras and Mohand-Said Hacid, 497-498. IEEE Computer Society.
Justeson, John S. and Slava M. Katz. 1995. “Technical terminology: some linguistic properties and an algorithm for identification in text”. Natural Language Engineering 1(1):9-27.
Kageura, Kyo. 2009. “Computing the potential lexical productivity of head elements in nominal compounds using the textual corpus”. Progress in Informatics, (6):49-56.
Kageura, Kyo and Umino, Bin 1996. “Methods of automatic term recognition: a review”. Terminology 3(2):259-289. TSB
Kit, Chunyu. 2002. “Corpus tools for retrieving and deriving termhood evidence.” In 5th East Asia Forum of Terminology, 69-80. Haikou, China.
Kit, Chunyu and Xiauyue Lui. 2008. “Measuring mono-word termhood by rank difference via corpus comparison.” Terminology 14(2):204-229.
Korkontzelos, Ioannis, Ioannis Klapaftis, and Suresh Manandhar. 2008. “Reviewing and Evaluating Automatic Term Recognition Techniques.” In Proceedings of the 6th International Conference on Natural Language Processing, edited by Bengt Nordström and Aarne Ranta, 248-259. Berlin/Heidelberg, Germany: Springer.
Liu, Xiaoyue and Chunyu Kit. 2009. “Statistical termhood measurement for mono-word terms via corpus comparison.” In Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, 3499-3504. IEEE Computer Society.
Manning, Christopher and Hinrich Schütze. 1999. Foundations of Statistical Natural Language Processing. Cambridge, MA, USA: MIT Press.
Matsuo, Yutaka and Mitsuru Ishizuka. 2004. “Keyword extraction from a single document using word co-occurrence statistical information.” International Journal on Artificial Intelligence Tools 13(1):157-169.
Maynard, Diana and Sophia Ananiadou. 1999. “Identifying Contextual Information for Multi-Word Term Extraction.” In Proceedings of the TKE ‘99 International Congress on Terminology and Knowledge Engineering, edited by Peter Sandrini, 212-221. Vienna, Austria: TermNet.
McEnery, Tony, Richard Xiao, and Yukio Tono, editors. 2006. Corpus-based Language Studies: An Advanced Resource Book. London, UK: Routledge.
Medelyan, Olena and Ian H. Witten. 2006. “Thesaurus based automatic keyphrase indexing.” In Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries, edited by Gary Marchionini, Michael L. Nelson and Catherine C. Marshall, 296-297. New York, USA: Association for Computer Machinery.
Nakagawa, Hiroshi. 2000. “Automatic Term Recognition based on Statistics of Compound Nouns.” Terminology 6(2):195-210. TSB
Nakagawa, Hiroshi and Tatsunori Mori. 1998. “Nested collocation and compound noun for term recognition.” InProceedings of the First Workshop on Computational Terminology, edited by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 64-70. Montreal, Canada: Université de Montréal.
. 2002. “A simple but powerful automatic term extraction method.” In Proceedings of the Second International Workshop on Computational Terminology, 1-7. Stroudsburg, PA, USA: Association for Computational Linguistics.
Nenadic, Goran, Sophia Ananiadou, and John McNaught. 2004. “Enhancing automatic term recognition through recognition of variation.” In Proceedings of the 20th international Conference on Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics.
Pantel, Patrick and Lin, Dekang. 2001. “A Statistical Corpus-Based Term Extractor”. In Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of intelligence: Advances in Artificial intelligence, edited by Eleni Stroulia and Stan Matwin, 36-46. Lecture Notes In Computer Science, vol. 2056. London: Springer-Verlag.
Pazienza, Maria Teresa, Marco Pennacchiotti, and Fabio Massimo Zanzotto. 2005. “Terminology extraction: an analysis of linguistic and statistical approaches.” In Knowledge Mining, edited by Spiros Sirmakessis. Series: Studies in Fuzziness and Soft Computing, Vol.185. Springer-Verlag.
Pecina, Pavel and Pavel Schlesinger. 2006. “Combining association measures for collocation extraction.” In Proceedings of the COLING/ACL on Main Conference Poster Sessions Annual Meeting of the ACL, 651-658. Morristown, NJ: Association for Computational Linguistics.
Rizzo, Camino R. 2010. “Getting on with corpus compilation: from theory to practice.” English for Specific Purposes World, Issue 1(27), vol. 9. [URL].
Sager, Juan C. 1978. Commentary by Prof. Juan Carlos Sager. In Actes Table Ronde sur les Problèmes du Découpage du Terme, edited by G. Rondeau, 39-74. Montréal: Commission de Terminologie de l’AILA.
Salton, Gerard, Andrew Wong, and Chung-Su Yang. 1975. “A vector space model for automatic indexing.” Communications of the ACM 18:613-620.
Sclano, Francesco, Paola Velardi. 2007. “Termextractor: a web application to learn the common terminology of interest groups and research communities.” In Proceedings of the 7th Conference on Terminology and Artificial Intelligence (TIA-2007), Sophia Antipolis.
Scott, Mike. 1997. “The Right Word in the Right Place: Key Word Associates in Two Languages.” AAA - Arbeiten aus Anglistik und Amerikanistik, 22 (2):239-252.
Simpson-Vlach, Rita and Nick Ellis. 2010. “An Academic Formulas List: New Methods in Phraseology Research.” Applied Linguistics 31:487-512. BoP
Thurmair, Gregor. 2003. “Making Term Extraction Tools Usable.” In Proceedings of the Joint Conference of the 8th Workshop of the European Association for Machine Translation and the 4th Controlled Language Applications Workshop. Dublin: European Association for Machine Translation.
Vivaldi, Jordi and Horacio Rodriguez. 2007. “Evaluation of terms and term extraction systems - A practical approach.” Terminology 13(2):225-248. TSB
Vivaldi, Jordi, Lluis Màrquez, and Horacio Rodríguez. 2001. “Improving Term Extraction by System Combination Using Boosting.” In Machine Learning ECML 2001, edited by Luc de Raedt and Peter Flach, 515-526. Series: Lecture Notes in Computer Science, vol. 2167. Springer.
Wermter, Joachim and Udo Hahn. 2005. “Paradigmatic Modifiability Statistics for the Extraction of Complex Multi-Word Terms.” In Proceedings of the Human Language Technology Conference and the Conference on Empirical Methods in Natural Language Processing, 843-850. Association for Computational Linguistics.
Wiechmann, Daniel. 2008. “On the Computation of Collostruction Strength: Testing Measures of Association as Expressions of Lexical Bias.” Corpus Linguistics and Linguistic Theory 4 (2):253-290.
Wong, Wilson, Wei Liu, and Mohammed Bennamoun. 2007. “Determining termhood for learning domain ontologies using domain prevalence and tendency.” In Proceedings of the Sixth Australasian Conference on Data Mining and Analytics, edited by Peter Christen, Paul Kennedy, Jiuyong Li, Inna Kolyshkina and Graham Williams, 47-54. Australian Computer Society.
Zhang, Ziqi, José Iria, Christopher Brewster, and Fabio Ciravegna. 2008. “A Comparative Evaluation of Term Recognition Algorithms.” In Proceedings of the Sixth Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco.
Cited by (17)
Cited by 17 other publications
Heinisch, Barbara
Steurs, Frieda & Dirk Kinable
2025. Efforts and challenges in translating concept to reality. In Handbook of Terminology [Handbook of Terminology, 4], ► pp. 377 ff.
Wissik, Tanja
2025. Impact of automatic term extraction on terminology work. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 31:1 ► pp. 110 ff.
Das, Bidyut, Mukta Majumder, Santanu Phadikar & Arif Ahmed Sekh
Chiocchetti, Elena, Vesna Lušicky & Tanja Wissik
2023. Multilingual legal terminology databases. In Handbook of Terminology [Handbook of Terminology, 3], ► pp. 458 ff.
Geeraerts, Dirk, Dirk Speelman, Kris Heylen, Mariana Montes, Stefano De Pascale, Karlien Franco & Michael Lang
Marín, María José
2023. Automatic term recognition and legal language. In Handbook of Terminology [Handbook of Terminology, 3], ► pp. 511 ff.
Barbero, Chiara
Bowker, Lynne
2022. Pivoting to support science communication in times of crisis. In Science Communication in Times of Crisis [Discourse Approaches to Politics, Society and Culture, 96], ► pp. 65 ff.
Vo, Chau, Tru Cao, Ngoc Truong, Trung Ngo & Dai Bui
2022. Automatic medical term extraction from Vietnamese clinical texts. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:2 ► pp. 299 ff.
Wu, Junfeng, Guangyan Huang & Roozbeh Zarei
Hoste, Veronique, Klaar Vanopstal, Ayla Rigouts Terryn & Els Lefever
Miyata, Rei & Kyo Kageura
2018. Building controlled bilingual terminologies for the municipal domain and evaluating them using a coverage estimation
approach. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 24:2 ► pp. 149 ff.
Arora, Chetan, Mehrdad Sabetzadeh, Lionel Briand & Frank Zimmer
Oliver, Antoni
Bowker, Lynne & Tom Delsey
2016. Information science, terminology and translation Studies. In Border Crossings [Benjamins Translation Library, 126], ► pp. 73 ff.
Nugumanova, Aliya, Igor Bessmertny, Yerzhan Baiburin & Madina Mansurova
This list is based on CrossRef data as of 11 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
