Exploring terminological relations between multi-word terms in distributional semantic models

Wang, Yizhe; Daille, Béatrice; Hathout, Nabil

doi:10.1075/term.21053.wan

Article published In: Terminology
Vol. 30:2 (2024) ► pp.159–189

Get fulltext from our e-platform

Download PDF

Download EPUB

Exploring terminological relations between multi-word terms in distributional semantic models

Yizhe Wang | Université de Toulouse Jean-Jaurès

Béatrice Daille | Nantes Université

Nabil Hathout | CNRS, Laboratoire Cognition, Langues, Langage, Ergonomie (CLLE)

Published online: 27 June 2023

https://doi.org/10.1075/term.21053.wan

Abstract

A term is a lexical unit with specialized meaning in a particular domain. Terms may be simple (STs) or multi-word (MWTs). The organization of terms gives a representation of the structure of domain knowledge, which is based on the relationships between the concepts of the domain. However, relations between MWTs are often underrepresented in terminology resources. This work aims to explore distributional semantic models for capturing terminological relations between multi-word terms through lexical substitution and analogy. The experiments show that the results of the analogy-based method are globally better than those of the one based on lexical substitution and that analogy is well suited to the acquisition of synonymy, antonymy, and hyponymy while lexical substitution performs best for hypernymy.

Keywords: terminology relations, multi-word terms, analogy, lexical substitution, FastText, masked language modeling, transformer models, environment domain

Article outline

1.Introduction
2.Identification of semantic relations in DSMs
- 2.1Semantic relations acquisition using DSMs
- 2.2Semantic relation acquisition using lexical substitution
- 2.3Analogy for semantic relation extraction
3.Experimental framework
- 3.1Main resources
  - 3.1.1Corpus
  - 3.1.2Lexical relation databases
- 3.2Models for the lexical substitution and analogy methods
- 3.3Distributional semantic models
- 3.4Evaluation metrics
4.Acquisition of synonymy between biterms
- 4.1Extraction of synonymous biterms from IATE
- 4.2Acquisition of synonymy between biterms using a masked language model
  - Test dataset for the MLM experiments
  - Experiment
  - Results
  - Qualitative analysis
- 4.3Identification of synonymy between biterms by means of analogy
  - Test dataset for analogy
  - Experiment
  - Results
  - Qualitative analysis
5.Acquiring other types of lexical relations
- 5.1Generation of semantically related biterms by semantic projection
- 5.2Acquiring the other lexical relations by means of masked language models
  - Test dataset used in the MLM experiments
  - Experimentation
  - Results and discussion
- 5.3Acquiring the other semantic relations by means of analogy
  - Test dataset for analogy
  - Experiments
  - Results and discussion
6.Discussion
7.Conclusion
Notes
References

References (69)

References

Allen, Carl, and Timothy Hospedales. 2019. “Analogies Explained: Towards Understanding Word Embeddings.” In International Conference on Machine Learning, 223–31. Long Beach, California, USA: PMLR.

Arefyev, Nikolay, Boris Sheludko, Alexander Podolskiy, and Alexander Panchenko. 2020. “A Comparative Study of Lexical Substitution Approaches Based on Neural Language Models.” ArXiv Preprint ArXiv:2006.00031.

Barrière, Caroline. 2004. “Knowledge-Rich Contexts Discovery.” In Conference of the Canadian Society for Computational Studies of Intelligence, Canadian AI, 187–201. London, Ontario, Canada: Springer.

Bernier-Colborne, Gabriel. 2017. “Aide à l’identification de Relations Lexicales Au Moyen de La Sémantique Distributionnelle et Son Application à Un Corpus Bilingue Du Domaine de l’environnement.” PhD Diss., Université de Montréal.

Bernier-Colborne, Gabriel, and Patrick Drouin. 2016. “Évaluation des modèles sémantiques distributionnels : le cas de la dérivation syntaxique (Evaluation of distributional semantic models : The case of syntactic derivation).” In Actes de la conférence conjointe JEP-TALN-RECITAL 2016. volume 2 : TALN (Articles longs), 125–38. Paris, France: AFCP – ATALA.

Bojanowski, Piotr, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. “Enriching Word Vectors with Subword Information.” Transactions of the Association for Computational Linguistics 51: 135–46.

Bouraoui, Zied, Jose Camacho-Collados, and Steven Schockaert. 2020. “Inducing Relational Knowledge from BERT.” In Proceedings of the AAAI Conference on Artificial Intelligence, 341:7456–63. New York, New York, United States.

Bouraoui, Zied, Shoaib Jameel, and Steven Schockaert. 2018. “Relation Induction in Word Embeddings Revisited.” In Proceedings of the 27th International Conference on Computational Linguistics, 1627–37. Santa Fe, New Mexico, USA: Association for Computational Linguistics.

Bourigault, Didier. 2002. “UPERY : Un Outil D’analyse Distributionnelle Étendue Pour La Construction D’ontologies à Partir de Corpus.” In Actes de La 9e Conférence Sur Le Traitement Automatique Des Langues Naturelles. Articles Longs, 75–84. Nancy, France: ATALA.

Bullinaria, John A., and Joseph P. Levy. 2012. “Extracting Semantic Representations from Word Co-Occurrence Statistics: Stop-Lists, Stemming, and SVD.” Behavior Research Methods 44 (3): 890–907.

Chaudhri, Vinay K., Justin Xu, Han Lin Aung, and Sajana Weerawardhena. 2022. “A Corpus of Biology Analogy Questions as a Challenge for Explainable AI.” In Bridging Human Intelligence and Artificial Intelligence, edited by Mark V. Albert, Lin Lin, Michael J. Spector, and Lemoyne S. Dunn, 327–37. Educational Communications and Technology: Issues and Innovations. Cham: Springer International Publishing.

Chen, Zhiwei, Zhe He, Xiuwen Liu, and Jiang Bian. 2018. “Evaluating Semantic Relations in Neural Word Embeddings with Biomedical and General Domain Knowledge Bases.” BMC Medical Informatics and Decision Making 18 (2): 53–68.

Cram, Damien, and Béatrice Daille. 2016. “Terminology Extraction with Term Variant Detection.” In Proceedings of ACL-2016 System Demonstrations, 13–18. Berlin, Germany: Association for Computational Linguistics.

Daille, Béatrice. 2017. Term Variation in Specialised Corpora: Characterisation, Automatic Discovery and Applications. Vol. 191. Amsterdam / Philadelphia: John Benjamins Publishing Company.

Daille, Béatrice, and Amir Hazem. 2014. “Semi-Compositional Method for Synonym Extraction of Multi-Word Terms.” In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), 1202–7. Reykjavik, Iceland: European Language Resources Association (ELRA). [URL]

Depraetere, Ilse. 2019. “Meaning in Context and Contextual Meaning: A Perspective on the Semantics-Pragmatics Interface Applied to Modal Verbs.” Anglophonia. French Journal of English Linguistics, no. 28.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–86. Minneapolis, Minnesota: Association for Computational Linguistics.

Espinosa Anke, Luis, Joan Codina-Filba, and Leo Wanner. 2021. “Evaluating Language Models for the Retrieval and Categorization of Lexical Collocations.” In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 1406–17. Online: Association for Computational Linguistics.

Ferret, Olivier. 2021. “Exploration des relations sémantiques sous-jacentes aux plongements contextuels de mots (Exploring semantic relations underlying contextual word embeddings).” In Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale, 26–36. Lille, France: ATALA.

Fu, Ruiji, Jiang Guo, Bing Qin, Wanxiang Che, Haifeng Wang, and Ting Liu. 2014. “Learning Semantic Hierarchies via Word Embeddings.” In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1199–1209. Baltimore, Maryland: Association for Computational Linguistics.

Gábor, Kata, Davide Buscaldi, Anne-Kathrin Schumann, Behrang QasemiZadeh, Haifa Zargayouna, and Thierry Charnois. 2018. “SemEval-2018 Task 7: Semantic Relation Extraction and Classification in Scientific Papers.” In Proceedings of The 12th International Workshop on Semantic Evaluation, 679–88. New Orleans, Louisiana.

Gladkova, Anna, Aleksandr Drozd, and Satoshi Matsuoka. 2016. “Analogy-Based Detection of Morphological and Semantic Relations with Word Embeddings: What Works and What Doesn’t.” In Proceedings of the NAACL Student Research Workshop, 8–15. San Diego, California.

Grabar, Natalia, and Thierry Hamon. 2006. “Terminology Structuring through the Derivational Morphology.” In International Conference on Natural Language Processing (in Finland), 652–63. Berlin Heidelberg: Springer.

Harris, Zellig S. 1954. “Distributional Structure.” Word 10 (2–3): 146–62.

Hashimoto, Kazuma, Pontus Stenetorp, Makoto Miwa, and Yoshimasa Tsuruoka. 2015. “Task-Oriented Learning of Word Embeddings for Semantic Relation Classification.” In Proceedings of the Nineteenth Conference on Computational Natural Language Learning, 268–78. Beijing, China: Association for Computational Linguistics.

Hazem, Amir, and Béatrice Daille. 2018. “Word Embedding Approach for Synonym Extraction of Multi-Word Terms.” In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 297–303. Miyazaki, Japan: European Language Resources Association (ELRA).

Hmida, Firas, Emmanuel Morin, and Béatrice Daille. 2015. “Extraction de Contextes Riches En Connaissances En Corpus Spécialisés.” In Actes de La 22e Conférence Sur Le Traitement Automatique Des Langues Naturelles. Articles Courts, 109–15. Caen, France: ATALA.

Hou, Jiaqi, Xin Li, Haipeng Yao, Haichun Sun, Tianle Mai, and Rongchen Zhu. 2020. “Bert-Based Chinese Relation Extraction for Public Security.” IEEE Access 81: 132367–75.

Jameel, Shoaib, Zied Bouraoui, and Steven Schockaert. 2017. “Modeling Semantic Relatedness Using Global Relation Vectors.” ArXiv Preprint ArXiv:1711.05294.

Kilgarriff, Adam, Miloš Husák, Katy McAdam, Michael Rundell, and Pavel Rychlý. 2008. “GDEX: Automatically Finding Good Dictionary Examples in a Corpus.” In Proceedings of the XIII EURALEX International Congress, 425–32. Barcelona, Spain: Documenta Universitaria.

Köper, Maximilian, Christian Scheible, and Sabine Schulte im Walde. 2015. “Multilingual Reliability and ‘Semantic’ Structure of Continuous Word Spaces.” In Proceedings of the 11th International Conference on Computational Semantics, 40–45. London, UK: Association for Computational Linguistics.

Kudo, Taku, and John Richardson. 2018. “SentencePiece: A Simple and Language Independent Subword Tokenizer and Detokenizer for Neural Text Processing.” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 66–71. Brussels, Belgium: Association for Computational Linguistics.

Lafourcade, Mathieu, and Lionel Ramadier. 2016. “Semantic Relation Extraction with Semantic Patterns Experiment on Radiology Reports.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), 4578–82. Portorož, Slovenia: European Language Resources Association (ELRA).

Lenci, Alessandro. 2008. “Distributional Semantics in Linguistic and Cognitive Research.” Italian Journal of Linguistics 20 (1): 1–31.

Lenci, Alessandro, and Giulia Benotto. 2012. “Identifying Hypernyms in Distributional Semantic Spaces.” In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), 75–79. Montréal, Canada: Association for Computational Linguistics.

Levy, Omer, Yoav Goldberg. 2014. “Linguistic Regularities in Sparse and Explicit Word Representations.” In Proceedings of the Eighteenth Conference on Computational Natural Language Learning, 171–80. Ann Arbor, Michigan: Association for Computational Linguistics. .

Levy, Omer, Yoav Goldberg, and Ido Dagan. 2015. “Improving Distributional Similarity with Lessons Learned from Word Embeddings.” Transactions of the Association for Computational Linguistics 31: 211–25.

Levy, Omer, Steffen Remus, Chris Biemann, and Ido Dagan. 2015. “Do Supervised Distributional Methods Really Learn Lexical Inference Relations?” In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 970–76. Denver, Colorado: Association for Computational Linguistics.

L’Homme, Marie-Claude. 2004. La Terminologie: Principes et Techniques. Montréal: Pum.

. 2020. Lexical Semantics for Terminology: An Introduction. Vol. 201. Amsterdam / Philadelphia: John Benjamins Publishing Company.

Martin, Louis, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric de la Clergerie, Djamé Seddah, and Benoît Sagot. 2020. “CamemBERT: A Tasty French Language Model.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7203–19. Online: Association for Computational Linguistics.

Meyer, Ingrid. 2001. “Extracting Knowledge-Rich Contexts for Terminography.” In Recent Advances in Computational Terminology, edited by Didier Bourigault, Christian Jacquemin and Marie-Claude L’Homme, 279–302. Amsterdam / Philadelphia: John Benjamins.

Mickus, Timothee, Denis Paperno, Mathieu Constant, and Kees van Deemter. 2020. “What Do You Mean, BERT?” In Proceedings of the Society for Computation in Linguistics 2020, 279–90. New York, New York: Association for Computational Linguistics.

Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. 2013. “Linguistic Regularities in Continuous Space Word Representations.” In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 746–51. Atlanta, Georgia: Association for Computational Linguistics.

Morin, Emmanuel, and Christian Jacquemin. 1999. “Projecting Corpus-Based Semantic Links on a Thesaurus.” In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, 389–96. USA: Association for Computational Linguistics.

Morlane-Hondère, François, and Cécile Fabre. 2012. “Le Test de Substituabilité à l’épreuve Des Corpus: Utiliser l’analyse Distributionnelle Automatique Pour l’étude Des Relations Lexicales.” In 3e CMLF, 11:1001–15. France: EDP Sciences.

Paullada, Amandalynne, Bethany Percha, and Trevor Cohen. 2020. “Improving Biomedical Analogical Retrieval with Embedding of Structural Dependencies.” In Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing, 38–48. Online: Association for Computational Linguistics.

Peters, Matthew E., Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Zettlemoyer. 2018. “Deep Contextualized Word Representations.” In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2227–37. New Orleans, Louisiana: Association for Computational Linguistics.

Peters, Matthew E., Mark Neumann, Luke Zettlemoyer, and Wen-tau Yih. 2018. “Dissecting Contextual Word Embeddings: Architecture and Representation.” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 1499–1509. Brussels, Belgium: Association for Computational Linguistics.

Polguère, Alain. 2016. Lexicologie et Sémantique Lexicale: Notions Fondamentales. Montréal: Presses de l’Université de Montréal.

Qiang, Jipeng, Yun Li, Yi Zhu, Yunhao Yuan, and Xindong Wu. 2019. “A Simple BERT-Based Approach for Lexical Simplification.” ArXiv abs/1907.06226.

Qiao, Bo, Zhuoyang Zou, Yu Huang, Kui Fang, Xinghui Zhu, and Yiming Chen. 2022. “A Joint Model for Entity and Relation Extraction Based on BERT.” Neural Comput. Appl. 34 (5): 3471–81.

Roller, Stephen, Katrin Erk, and Gemma Boleda. 2014. “Inclusive yet Selective: Supervised Distributional Hypernymy Detection.” In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, 1025–36. Dublin, Ireland: Dublin City University and Association for Computational Linguistics.

Santos, Cicero Nogueira dos, Bing Xiang, and Bowen Zhou. 2015. “Classifying Relations by Ranking with Convolutional Neural Networks.” In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 626–34. Beijing, China: Association for Computational Linguistics.

Schick, Timo, and Hinrich Schütze. 2019. “Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 489–94. Minneapolis, Minnesota: Association for Computational Linguistics.

. 2020. “Rare Words: A Major Problem for Contextualized Embeddings and How to Fix It by Attentive Mimicking.” In Proceedings of the AAAI Conference on Artificial Intelligence, 341:8766–74. New York, USA: AAAI Press.

Shi, Peng, and Jimmy Lin. 2019. “Simple BERT Models for Relation Extraction and Semantic Role Labeling.” ArXiv Preprint ArXiv:1904.05255.

Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. 2014. “Sequence to Sequence Learning with Neural Networks.” In Proceedings of the 27th International Conference on Neural Information Processing Systems – Volume 2, 3104–12. Montreal, Canada: MIT Press.

Turney, Peter D. 2005. “Measuring Semantic Similarity by Latent Relational Analysis.” In Proceedings of the 19th International Joint Conference on Artificial Intelligence, 1136–41. IJCAI’05. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

Verspoor, Cornelia M., Cliff Joslyn, and George J. Papcun. 2003. “The Gene Ontology as a Source of Lexical Semantic Knowledge for a Biological Natural Language Processing Application.” In SIGIR Workshop on Text Analysis and Search for Bioinformatics, 51–56. Toronto, Canada.

Vylomova, Ekaterina, Laura Rimell, Trevor Cohn, and Timothy Baldwin. 2016. “Take and Took, Gaggle and Goose, Book and Read: Evaluating the Utility of Vector Differences for Lexical Relation Learning.” In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1671–82. Berlin, Germany: Association for Computational Linguistics.

Weeds, Julie, Daoud Clarke, Jeremy Reffin, David Weir, and Bill Keller. 2014. “Learning to Distinguish Hypernyms and Co-Hyponyms.” In Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, 2249–59. Dublin, Ireland: Dublin City University and Association for Computational Linguistics.

Weeds, Julie, et David Weir. 2003. «A General Framework for Distributional Similarity». In Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing, 81–88. EMNLP ’03. USA: Association for Computational Linguistics.

Xue, Kui, Yangming Zhou, Zhiyuan Ma, Tong Ruan, Huanhuan Zhang, and Ping He. 2019. “Fine-Tuning BERT for Joint Entity and Relation Extraction in Chinese Medical Text.” In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 892–97. San Diego, CA, USA: IEEE.

Yao, Liang, Chengsheng Mao, and Yuan Luo. 2019. “KG-BERT: BERT for Knowledge Graph Completion.” ArXiv Preprint ArXiv:1909.03193.

Zhang, Li, Jun Li, and Chao Wang. 2017. “Automatic Synonym Extraction Using Word2Vec and Spectral Clustering.” In 2017 36th Chinese Control Conference (CCC), 5629–32. Dalian, China: IEEE.

Zhou, Wangchunshu, Tao Ge, Ke Xu, Furu Wei, and Ming Zhou. 2019. “BERT-Based Lexical Substitution.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3368–73. Florence, Italy: Association for Computational Linguistics.

Zhu, Yongjun, Erjia Yan, and Fei Wang. 2017. “Semantic Relatedness and Similarity of Biomedical Terms: Examining the Effects of Recency, Size, and Section of Biomedical Publications on the Performance of Word2vec.” BMC Medical Informatics and Decision Making 17 (1): 1–8.

Zweigenbaum, Pierre, and Natalia Grabar. 2000. “Liens Morphologiques et Structuration de Terminologie.” In IC 2000 : Ingénierie Des Connaissances, 325–34. Toulouse: Irit.

Cited by (2)

Cited by two other publications

L’Homme, Marie-Claude

2025. Representing multiword expressions in terminology resources. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication

Sajja, Ramteja, Yusuf Sermet & Ibrahim Demir

2025. Domain-specific embedding models for hydrology and environmental sciences: enhancing semantic retrieval and question answering. Water Science & Technology 92:9 ► pp. 1328 ff.

This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.