Article published In: Lexical semantic approaches to terminology
Edited by Pamela Faber and Marie-Claude L'Homme
[Terminology 20:2] 2014
► pp. 279–303
Clustering for semantic purposes
Exploration of semantic similarity in a technical corpus
Published online: 31 October 2014
https://doi.org/10.1075/term.20.2.07ber
https://doi.org/10.1075/term.20.2.07ber
This paper presents an innovative approach, within the framework of distributional semantics, for the exploration of semantic similarity in a technical corpus. In complement to a previous quantitative semantic analysis conducted in the same domain of machining terminology, this paper sets out to discover fine-grained semantic distinctions in an attempt to explore the semantic heterogeneity of a number of technical items. Multidimensional scaling analysis (MDS) was carried out in order to cluster first-order co-occurrences of a technical node with respect to shared second-order and third-order co-occurrences. By taking into account the association values between relevant first and second-order co-occurrences, semantic similarities and dissimilarities between first-order co-occurrences could be determined, as well as proximities and distances on a graph. In our discussion of the methodology and results of statistical clustering techniques for semantic purposes, we pay special attention to the linguistic and terminological interpretation.
References (47)
Arntz, Reiner, and Heribert Picht. 1989. Einführung in die Terminologiearbeit. Hildesheim: Georg Olms Verlag.
Baayen, Rolf H. 2008. Analyzing Linguistic Data. A Practical Introduction to Statistics Using R. Cambridge: Cambridge University Press.
Bertels, Ann, and Dirk Speelman. 2012. “La contribution des cooccurrences de deuxième ordre à l’analyse sémantique.” Corpus 111: 147–165.
. 2013. “Exploration sémantique visuelle à partir des cooccurrences de deuxième et troisième ordre.” In
Actes de Traitement Automatique des Langues Naturelles (TALN 2013) Atelier Sémantique Distributionnelle (SemDis)
, 126–139. Sables d’Olonne, France.
Bertels, Ann, Dirk Speelman, and Dirk Geeraerts. 2010. “La corrélation entre la spécificité et la sémantique dans un corpus spécialisé.” Revue de Sémantique et de Pragmatique 271: 79–102.
Bertels, Ann. 2006. La polysémie du vocabulaire technique. Une étude quantitative. PhD thesis. University of Leuven.
. 2011. “The Dynamics of Terms and Meaning in the Domain of Machining Terminology.” Terminology 17 (1): 94–112.
Biemann, Chris, Stefan Bordag, and Uwe Quasthoff. 2004. “Automatic Acquisition of Paradigmatic Relations Using Iterated Co-occurrences.” In
Proceedings of Language Resources and Evaluation (LREC 2004)
, 967–970. Lisboa, Portugal.
Borg, Ingwer, and Patrick Groenen. 2005. Modern Multidimensional Scaling: Theory and Applications. New York: Springer-Verlag.
Cabré, Maria Teresa. 2000. “Terminologie et linguistique: la théorie des portes.” Terminologies nouvelles 21: 10–15.
Church, Kenneth W., and Patrick Hanks. 1990. “Word Association Norms, Mutual Information, and Lexicography.” Computational Linguistics 16 (1): 22–29.
Clarke, Daoud. 2012. “A Context-Theoretic Framework for Compositionality in Distributional Semantics.” Computational Linguistics 38 (1): 41–71.
Clarke, K.R. 1993. “Non-parametric Multivariate Analyses of Change in Community Structure.” Australian Journal of Ecology 181: 117–143.
Condamines, Anne, and Josette Rebeyrolle. 1997. “Point de vue en langue spécialisée.” Meta 42 (1): 174–184.
Cox, Trevor F., and Michael A.A. Cox. 2001. Multidimensional Scaling. Boca Raton: FL. Chapman & Hall.
Dunning, Ted. 1993. “Accurate Methods for the Statistics of Surprise and Coincidence.” Computational Linguistics 19 (1): 61–74.
Eriksen, Lars. 2002. “Die Polysemie in der Allgemeinsprache und in der juristischen Fachsprache. Oder: Zur Terminologie der “Sache” im Deutschen.” Hermes 281: 211–222.
Evert, Stefan. 2007. Corpora and Collocations. Extended Manuscript of Chapter 58 of Lüdeling A., and M. Kytö. 2008. Corpus Linguistics. An International Handbook. Berlin: Mouton de Gruyter. [URL]. Accessed June 2014.
. 2012. “The Role of Dimensionality Reduction in Distributional Semantics.”
Presentation at Leuven Statistics Days
. Leuven, 8 June 2012.
Faber, Pamela (ed.). 2012. A Cognitive Linguistics View of Terminology and Specialized Language. Berlin/Boston: De Gruyter.
Ferrari, Laura. 2002. “Un caso de polisemia en el discurso jurídico?” Terminology 8 (2): 221–244.
Ferret, Olivier. 2010. “Similarité sémantique et extraction de synonymes à partir de corpus.” In
Actes de Traitement Automatique des Langues Naturelles (TALN 2010)
. Montréal, Canada.
Firth, John R. 1968. “A Synopsis of Linguistic Theory, 1930-1955.” In Selected Papers of JR Firth, 1952-59, ed. by John R. Firth, 168–205. Bloomington: Indiana University Press.
Gaudin, François. 2003. Socioterminologie: une approche sociolinguistique de la terminologie. Bruxelles: Duculot.
Grefenstette, Gregory. 1994. “Corpus-derived First, Second and Third-order Word Affinities.” In
Proceedings of Euralex 1994. International Congress on Lexicography
, 279–290. Amsterdam, the Netherlands.
Habert, Benoît, Gabriel Illouz, and Helka Folch. 2005. “Des décalages de distribution aux divergences d’acception.” In Sémantique et corpus, ed. by Anne Condamines, 277–314. Paris: Hermes-Science.
Heylen, Kris, Dirk Speelman, and Dirk Geeraerts. 2012. “Looking at Word Meaning. An Interactive Visualization of Semantic Vector Spaces for Dutch Synsets.” In
Proceedings of the European Chapter of the Association for Computational Linguistics (EACL 2012)
, 16–24. Avignon, France.
Kruskal, Joseph B., and Myron Wish. 1978. Multidimensional Scaling. Sage University Paper series on Quantitative Applications in the Social Sciences, number 07-011. Newbury Park, CA: Sage Publications.
Landauer, Thomas K., and Susan T. Dumais. 1997. “A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition, Introduction and Representation of Knowledge.” Psychological Review 104 (2): 211–240.
Lemaire, Benoît, and Guy Denhière. 2006. “Effects of High-Order Co-occurrences on Word Semantic Similarity.” Current Psychology Letters 18 (1). [URL]. Accessed June 2014.
Morardo, Mikaël, and Eric Villemonte de La Clergerie. 2013. “Vers un environnement de production et de validation de ressources lexicales sémantiques.” In
Actes de Traitement Automatique des Langues Naturelles (TALN 2013) Atelier Sémantique Distributionnelle (SemDis)
, 167–180. Sables d’Olonne, France.
Morlane-Hondère, François. 2013. “Utiliser une base distributionnelle pour filtrer un dictionnaire de synonymes.” In
Actes de Traitement Automatique des Langues Naturelles (TALN 2013) Atelier Sémantique Distributionnelle (SemDis)
, 112–125. Sables d’Olonne, France.
Nazar, Rogelio, Jorge Vivaldi, and Leo Wanner. 2012. “Automatic Taxonomy Extraction for Specialized Domains Using Distributional Semantics.” Terminology 18 (2): 188–225.
Padó, Sebastian, and Mirella Lapata. 2007. “Dependency-based Construction of Semantic Space Models.” Computational Linguistics 33 (2): 161–199.
Peirsman, Yves, and Dirk Geeraerts. 2009. “Predicting Strong Associations on the Basis of Corpus Data.” In
Proceedings of the European Chapter of the Association for Computational Linguistics (EACL 2009)
, 648–656. Athens, Greece.
Sahlgren, Magnus. 2006. The Word-Space Model. PhD thesis, Stockholm University, Sweden.
Schütze, Hinrich. 1998. “Automatic Word Sense Discrimination.” Computational Linguistics 24 (1): 97–123.
Temmerman, Rita. 2000. Towards New Ways of Terminology Description. The Sociocognitive Approach. Amsterdam: John Benjamins.
Turney, Peter D., and Patrick Pantel. 2010. “From Frequency to Meaning: Vector Space Models of Semantics.” Journal of Artificial Intelligence Research 371: 141–188.
van der Laan, Mark J., and Katherine S. Pollard. 2003. “A New Algorithm for Hybrid Hierarchical Clustering with Visualization and the Bootstrap.” Journal of Statistical Planning and Inference 1171: 275–303.
Venables, William N., and Brian D. Ripley. 2002. Modern Applied Statistics with S. New York: Springer-Verlag.
Wielfaert, Thomas, Kris Heylen, and Dirk Speelman. 2013. “Interactive Visualizations of Semantic Vector Spaces for Lexicological Analysis.” In
Actes de Traitement Automatique des Langues Naturelles (TALN 2013) Atelier Sémantique Distributionnelle (SemDis)
, 154–166. Sables d’Olonne, France.
Cited by (6)
Cited by six other publications
Du, Jiali, Christina Alexandris, Yajun Pei, Yuming Lian & Pingfang Yu
Kwong, Oi Yee
2021. User-driven assessment of commercial term extractors. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 27:2 ► pp. 179 ff.
Du, Jiali, Christina Alexantris & Pingfang Yu
Bertels, Ann
Nazar, Rogelio
2016. Distributional analysis applied to terminology extraction. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 22:2 ► pp. 141 ff.
[no author supplied]
This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
