Article published In: Terminology
Vol. 22:2 (2016) ► pp.141–170
Distributional analysis applied to terminology extraction
First results in the domain of psychiatry in Spanish
Published online: 21 February 2017
https://doi.org/10.1075/term.22.2.01naz
https://doi.org/10.1075/term.22.2.01naz
This paper presents the first results of a new method for terminology extraction based on distributional analysis. The intuition behind the algorithm is that single or multi-word lexical units that refer to specialised concepts will show a characteristic co-occurrence pattern, described as a tendency to appear in the same contexts with other conceptually related terms. E.g. the term fluoxetine will systematically appear in the same sentences with other related terms such as depression, serotonin reuptake inhibitor, obsessive–compulsive disorder and others. Of course, terms will co-occur with general vocabulary units as well, but not with a characteristic pattern as when a conceptual relation holds. Experimental evaluation of this method was conducted in a corpus of psychiatry journals from Spain and Latin America, and concluded that the results are significantly better than other methods.
References (76)
Alfonseca, E., and S. Manandhar. 2002. “Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures.” In Proceedings of the 13th International Conference on Knowledge Engineering and Knowledge Management. Ontologies and the Semantic Web (EKAW ‘02), ed. by Asunción Gómez-Pérez and V. Richard Benjamins, 1–7. London, UK: Springer-Verlag.
Ananiadou, S. 1994. “A Methodology for Automatic Term Recognition.” In Proceedings of the
15th International Conference on Computational Linguistics
, 1034–1038. Kyoto, Japan.
Anthony, L. 2005. “AntConc: Design and Development of a Freeware Corpus Analysis Toolkit for the Technical Writing Classroom.” In Proceedings of
International Professional Communication Conference, (IPCC 2005)
, 729–737. 10-13 July 2005, IEEE, Limerick, Ireland.
Artstein, R., and M. Poesio. 2008. “Inter-coder Agreement for Computational Linguistics.” Computational Linguistics 34(4): 555–596.
Atserias, J., B. Casas, E. Comelles, M. González, L. Padró, and M. Padró. 2006. “FreeLing 1.3: Syntactic and Semantic Services in an Open-source NLP Library.” In Proceedings of the
Fifth International Conference on Language Resources and Evaluation (LREC 2006)
. 24-26 May 2006, Genoa, Italy.
Aubin, S., and T. Hamon. 2006. “Improving Term Extraction with Terminological Resources.” In Advances in Natural Language Processing: Lecture Notes in Computer Science, ed. by T. Salakoski, F. Ginter, S. Pyysalo, and T. Pahikkala, 380–387. Berlin/Heidelberg: Springer.
Baroni, M., and A. Lenci. 2010. “Distributional Memory: A General Framework for Corpus-Based Semantics.” Computational Linguistics 36(4): 673–721.
Benavent, P., and S. Parrilla. 2006. “Análisis de la extracción automática de términos con el programa informático ExtraTerm.” Fòrum de recerca 121:1–10.
Bernier-Colborne, G. 2014. “Identifying Semantic Relations in a Specialized Corpus through Distributional Analysis of a Cooccurrence Tensor.” In Proceedings of the
Third Joint Conference on Lexical and Computational Semantics (*SEM 2014)
, 57–62. Dublin, Ireland.
Bertels, A., and D. Speelman. 2014. “Clustering for Semantic Purposes: Exploration of Semantic Similarity in a Technical Corpus.” Terminology 20(2): 279–303.
Bolshakova, E., N. Loukachevitch, and M. Nokel. 2013. “Topic Models Can Improve Domain Term Extraction.” In Advances in Information Retrieval, ed. by Pavel Serdyukov, Pavel Braslavski, Sergei O. Kuznetsov, Jaap Kamps, Stefan Rüger, Eugene Agichtein, Ilya Segalovich, and Emine Yilmaz. Lecture Notes in Computer Science, 684–687. Berlin/Heidelberg: Springer.
Bourigault, D., I. Gonzales-Mullier, and C. Gros. 1996. “LEXTER, a Natural Language Tool for Terminology Extraction.” In Proceedings of the 7th
EURALEX Congress
, ed. by M. Gellerstam, J. Järborg, S. Malmgren, K. Norén, L. Rogström, and C. Röjder Papmehl, 771–779. Göteborg, Sweden.
Bourigault, D., and C. Jacquemin. 1999. “Term Extraction + Term Clustering: An Integrated Platform for Computer-Aided Terminology.” In Proceedings of the
Ninth Conference on European Chapter of the Association for Computational Linguistics (EACL ‘99)
, 15–22. Association for Computational Linguistics, Stroudsburg, PA, USA.
Budin, G. 2001. “A Critical Evaluation of the State-of-the-art of Terminology Theory.” ITTF Journal 12(1-2): 7–23.
Bullinaria, J.A. 2008. “Semantic Categorization Using Simple Word Co-occurrence Statistics.” In Proceedings of the ESSLLI Workshop on Distributional Lexical Semantics, ed. by M. Baroni, S. Evert, and A. Lenci, 1–8. Hamburg, Germany: ESSLLI.
Bullinaria, J., and J. Levy. 2007. “Extracting Semantic Representations from Word Co-occurrence Statistics: A Computational Study.” Behavior Research Methods 39(3): 510–526.
Cabré. M.T. 1999. La terminologia: representación y comunicación. Barcelona: IULA.
Cabré, M.T., R. Estopà, and J. Vivaldi. 2001. “Automatic Term Detection: A Review of Current Systems.” In Recent Advances in Computational Terminology, ed. by D. Bourigault, C. Jacquemin, and M.-C. L’Homme, 53–87. Amsterdam: John Benjamins.
Conrado, M., T. Pardo, and S. Rezende. 2013. “A Machine Learning Approach to Automatic Term Extraction using a Rich Feature Set.” In Proceedings of the
2013 NAACL HLT Student Research Workshop
, 16–23. Atlanta, US: Association for Computational Linguistics.
Dagan, I., and K. Church. 1994. “Termight: Identifying and Translating Technical Terminology.” In Proceedings of the
fourth Conference on Applied Natural Language Processing (ANLC ‘94)
, 34–40. Stuttgart, Germany.
Daille, B. 1994. Approche mixte pour l’extraction automatique de terminologie: statistiques lexicales et filtres linguistiques. Thèse de Doctorat en Informatique Fondamentale. Université Paris 7, Paris.
Drouin, P. 2003. “Term Extraction Using Non-technical Corpora as a Point of Leverage.” Terminology 9(1): 99–117.
Enguehard, C., and L. Pantera. 1994. “Automatic Natural Acquisition of a Terminology.” Journal of Quantitative Linguistics 2(1): 27–32.
Enguehard, C., B. Daille, and E. Morin. 2002. “Tools for Terminology Processing.” In Proceedings of the
Indo-European Conference on Multilingual Communications Technologies (IEMCT)
, 218–229. Pune, India.
Faber, P., P. León, and J. Prieto. 2009. “Semantic relations, dynamicity and terminological knowledge bases”. Current Issues in Language Studies 1(1): 1–23.
Gaussier, E. 2001. “General Considerations on Bilingual Terminology Extraction.” In Recent Advances in Computational Terminology, ed. by D. Bourigault, C. Jacquemin, and M.-C. L’Homme, 167–183. Amsterdam: John Benjamins.
Heaps, H. 1978. Information Retrieval: Computational and Theoretical Aspects. New York: Academic Press.
Jacquemin, C. 1997. Variation terminologique: Reconnaissance et acquisition automatiques de termes et de leurs variantes en corpus. Mémoire d’Habilitation à Diriger des Recherches en informatique fondamentale, Université de Nantes, Nantes.
Justeson, J., and S. Katz. 1995. “Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text.” Natural Language Engineering 1(1): 9–27.
Kageura, K., and B. Umino. 1996. “Methods of Automatic Term Recognition.” Terminology 3(2): 259–290.
Kageura, K. 2002. The Dynamics of Terminology: A Descriptive Theory of Term Formation and Terminological Growth. Amsterdam: John Benjamins.
. 2012. The Quantitative Analysis of the Dynamics and Structure of Terminologies. Amsterdam: John Benjamins.
Kilgarriff, A., and D. Tugwell. 2001. “Word Sketch: Extraction and Display of Significant Collocations for Lexicography.” In Proceedings of the ACL Workshop on Collocation: Computational Extraction, Analysis and Exploitation, 32–38. Toulouse, France.
Kilgarriff, A., and I. Renau. 2013. “esTenTen, a Vast Web Corpus of Peninsular and American Spanish.” Procedia Social and Behavioral Sciences 951: 12–19.
Lavelli, A., F. Sebastiani, and R. Zanoli. 2004. “Distributional Term representations: An Experimental Comparison.” In Proceedings of the
thirteenth ACM International Conference on Information and knowledge management (CIKM ‘04)
, 615–624. ACM, New York.
L’Homme, M.C. 2004. La terminologie: principes et techniques. Montréal: Presses Université de Montréal.
. 2015. “Predicative Lexical Units in Terminology.” In Recent Advances in Language Production, ed. by N. Gala, R. Rapp, and G. Bel-Enguix, Cognition and the Lexicon, 75–93. Berlin: Springer.
Loginova, E., A. Gojun, H. Blancafort, M. Guegan, T. Gornostay, and U. Heid. 2012. “Reference Lists for the Evaluation of Term Extraction Tools.” In Proceedings of
Terminology and Knowledge Engineering (TKE 2012)
. Madrid, Spain.
Lossio-Ventura, J.A., C. Jonquet, M. Roche, and M. Teisseire. 2014. “Biomedical Terminology Extraction: A New Combination of Statistical, Web Mining Approaches.” In
Proceedings of Journées Internationales d’Analyse Statistique Des Données Textuelles (JADT2014)
, ed. by E. Née, J-M. Daube, M. Valette, and S. Fleury, 421–432. June 3-6, 2014, Paris, France.
Lund, K., C. Burgess, and R. Atchley. 1995. “Semantic and Associative Priming in High-dimensional Semantic Space.” In Proceedings of the
17th Annual Conference of the Cognitive Science Society
171: 660–665. Hillsdale, NJ: Erlbaum.
Manning, Ch., P. Raghavan, and H. Schütze. 2008. Introduction to Information Retrieval. Cambridge: Cambridge University Press.
Maynard, D., and S. Ananiadou. 2000. “TRUCKS: A Model for Automatic Term Recognition.” Journal of Natural Language Processing 8(1): 101–125.
Navigli, R., P. Velardi, and S. Faralli. 2011. “A Graph-based Algorithm for Inducing Lexical Taxonomies from Scratch.” In Proceedings of the
Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI’11)
, 31:1871–1877. July 16-22, 2011. Barcelona, Spain: AAAI Press.
Nazar, R. 2011. “A Statistical Approach to Term Extraction.” International Journal of English Studies 11(2): 153–176.
Pazienza, M.T., M. Pennacchiotti, and F.M. Zanzotto. 2005. “Terminology Extraction: An Analysis of Linguistic and Statistical Approaches.” In Knowledge Mining, ed. by S. Sirmakessis, 255–279. Berlin/Heidelberg: Springer.
Pantel, P., and D. Lin. 2001. “A Statistical Corpus-Based Term Extractor.” In Proceedings of the
14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence
, 36–46. London, UK.
Patry, A., and P. Langlais. 2005. “Corpus-Based Terminology Extraction.” In
7th International Terminology and Knowledge Engineering Conference (TKE 2005)
, 313–321. Copenhagen, Danemark.
Périnet, A., and T. Hamon. 2014. “Generalising and Normalising Distributional Contexts to Reduce Data Sparsity: Application to Medical Corpora.” In Proceedings of the
4th International Workshop on Computational Terminology
, 1–10. Dublin, Ireland.
Oliver, T., and M. Vàzquez. 2007. “A Free Terminology Extraction Suite.” In Proceedings of the Twenty-ninth International Conference on Translating and the Computer, 29–30. November 2007, London.
Rey, A. 1979/1992. “Noms et notions: la terminologie” Que sais-je? Paris: Presses universitaires de France.
Sager, J.C. 1990. A Practical Course in Terminology Processing. Amsterdam: John Benjamins.
Schmid, H. 1994. “Probabilistic Part-of-Speech Tagging Using Decision Trees.” In Proceedings of
International Conference on New Methods in Language Processing
, 44–49. Manchester, UK.
Spärck Jones, K. 1972. “A Statistical Interpretation of Term Specificity and its Application in Retrieval.” Journal of Documentation 28(1): 11–21.
Temmerman, R. 2000. Towards New Ways of Terminological Description. The Sociocognitive Approach. Amsterdam: John Benjamins.
Turney, P., and P. Pantel. 2010. “From Frequency to Meaning: Vector Space Models of Semantics.” Journal of Artificial Intelligence Research 371: 141–188.
Vargas-Sierra, C. 2014. “Estudio contrastivo inglés-español de combinatoria especializada.” Paper presented at
XIV Simposio Iberoamericano de Terminología (RITerm 2014)
. Santiago, Chile.
Vivaldi, J. 2001. Extracción de candidatos a término mediante combinación de estrategias heterogéneas. PhD thesis, Universitat Pompeu Fabra, Barcelona
Vivaldi, J., and H. Rodríguez. 2011. “Extracting Terminology from Wikipedia.” Procesamiento del lenguaje natural 471: 65–73.
Wüster, E. 1979. Introduction to the General Theory of Terminology and Terminological Lexicography. Wien: Springer.
Zadeh, B., and S. Handschuh. 2014. “Evaluation of Technology Term Recognition with Random Indexing.” In Proceedings of the
Ninth International Conference on Language Resources and Evaluation (LREC’14)
, 4027–2032. May 26-31, 2014. Reykjavik, Iceland.
Zhang, Z., J. Iria, C. Brewster, and F. Ciravegna. 2008. “A Comparative Evaluation of Term Recognition Algorithms.” In Proceedings of The Sixth International Conference on Language Resources and Evaluation, (LREC 2008), 2108–2113. Marrakech, Morocco.
Cited by (10)
Cited by ten other publications
Morgado Fernández, Paula
Lillo Fuentes, Fernando, Carmen López-Ferrero & René Venegas
Mouratidis, Despoina, Katia Kermanidis & Andreas Kanavos
Kister, Laurence & Evelyne Jacquey
Kováříková, Dominika
Kováříková, Dominika
Du, Jiali, Christina Alexantris & Pingfang Yu
Fkih, Fethi & Mohamed Nazih Omri
Rigouts Terryn, Ayla, Véronique Hoste & Els Lefever
[no author supplied]
This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
