In:Computational Phraseology
Edited by Gloria Corpas Pastor and Jean-Pierre Colson
[IVITRA Research in Linguistics and Literature 24] 2020
► pp. 151–176
Collecting collocations from general and specialised corpora
A comparative analysis
Marie-Claude L’Homme | Observatoire de linguistique Sens-Texte, Université de Montréal | mc.lhomme@umontreal.ca
Daphnée Azoulay | Observatoire de linguistique Sens-Texte, Université de Montréal | daphnée.azoulay@umontreal.ca
Published online: 8 May 2020
https://doi.org/10.1075/ivitra.24.08lho
https://doi.org/10.1075/ivitra.24.08lho
Abstract
Collocations are increasingly taken into account in general and specialised repositories and methodologies to collect them
are heavily based on corpora. However, lexicographers and terminologists use different kinds of corpora in which combinations are likely
to behave according to specific rules and/or patterns. This contribution presents a comparative analysis of the collocational behaviour of
15 lexical items found in a general language corpus and a specialised corpus on the theme of the environment. We automatically extracted
large sets of collocates (three lists of 50 collocates) for each lexical item and from each corpus and analyse different facets of
collocational behaviour: polysemy of lexical items, characteristics of collocates (overlap, rank and semantic classes of collocates,
etc.). Our aim is to draw the attention of terminologists and lexicographers to some specific factors affecting the behaviour of
collocations in specialized and general corpora.
Keywords: Collocation, terminology, lexicography, specialised corpus, general corpus, semantic class
Résumé
Les ressources générales et spécialisées accordent une place de plus en plus importante aux collocations. Les méthodologies
pour les recueillir reposent principalement sur des corpus. Toutefois, les lexicographes et les terminologues font appel à des corpus de
naturedifférente et dans lesquels les combinaisons sont susceptibles d’obéir à des règles spécifiques. Cette contribution présente une
analyse comparative du comportement collocationnel de 15 formes lexicales apparaissant dans un corpus général et un corpus spécialisé
portant sur l’environnement. Nous avons extrait automatiquement de longues listes de collocations (trois ensembles de 50 collocations)
pour chaque item lexical de chacun des deux corpus. Nous observons différentes facettes du comportement collocationnel: polysémie des
formes lexicales, caractéristiques des collocatifs (convergence, rangs et classes sémantiques des collocatifs, etc.). L’objectif est
d’attirer l’attention des terminologues et des lexicographes sur des aspects particuliers du comportement des collocations dans des corpus
de nature différente.
Mots clés : Collocations, terminologie, lexicographie, corpus spécialisé, corpus général, classe sémantique
Article outline
- 1.Introduction
- 2.Lexical combinations in terminology and lexicography
- 3.A comparative analysis
- 3.1Corpora
- 3.2Lexical items selected
- 3.3Automated extraction of collocations
- 4.Observations on the lists of candidate collocations
- 4.1Overlap of candidate collocates
- 4.2Rank of candidates
- 4.3How collocates reveal specific meanings of items
- 5.Concluding remarks: Summary and guidelines for terminologists and lexicographers
Acknowledgements Notes Funding References
References (18)
Azoulay, D. (2017). Frame-Based Knowledge Representation Using Large Specialized Corpora. In Proceedings of the AAAI Spring Symposium on Computational Construction Grammar and Natural Language Understanding. Stanford University, CA.
Benson, N., Benson, E., & Ilson, R. (1986). The BBI Combinatory Dictionary of English: A guide to word combinations. Amsterdam/Philadelphia: John Benjamins.
Binon, J., Verlinde, S., Van Dyck, J., & Bertels, A. (2000). Dictionnaire d’apprentissage du français des affaires. Dictionnaire de compréhension et de production de la langue des affaires. Paris: Didier.
Buendía, M., & Faber, P. (2014). Collocation dictionaries: a comparative analysis. MonTi: Monografías de Traducción e Interpretación, 6, 203–235.
DiCoInfo. Dictionnaire fondamental de l’informatique et de l’Internet. (2016). [URL].
Drouin, P. (2003). Term Extraction Using Non-technical Corpora as a Point of Leverage. Terminology, 9(1), 99–117.
Evert, S. (2004). The Statistics of Word Cooccurrences. Word Pairs and Collocations. (Thesis presented at the University of Stuttgart, Germany).
(2008). Corpora and collocations. In A. Ludeling, & M. Kytö (Eds.), Corpus Linguistics. An International Handbook. Berlin: Mouton de Gruyter.
Haussmann, F. J. (1979). Un dictionnaire des collocations est-il possible? Travaux de linguistique et de littérature, 17(1), 187–195.
Iordanskaja, L., & Mel’cuk, I. (2017). Le mot dans le lexique et le mot dans la phrase. Paris: Hermann.
L’Homme, M. C. (2000). Understanding Specialized Lexical Combinations. Terminology, 6(1), 89-110.
(2009). A Methodology for Describing Collocations in a Specialized Dictionary. In S. Nielsen, & S. Tarp (Eds.), Lexicography in the 21st Century In honour of Henning Bergenholtz(pp. 237–256). Amsterdam/Philadelphia: John Benjamins.
Mel’čuk, I. (1996). Lexical Functions: A Tool for the Description of Lexical Relations in the Lexicon. In L. Wanner (Ed.), Lexical Functions in Lexicography and Language Processing (pp. 37–102). Amsterdam/Philadelphia: Benjamins. Merriam-Webster Dictionary. (2016). [URL]
Moon, R. (2015). Multiword Items. In J. Taylor (Ed.), Handbook of the Word (pp. 121–140). Oxford: Oxford University Press.
Merriam-Webster Dictionary. 2016. ([URL]).
Cited by (3)
Cited by three other publications
Mroczyńska, Katarzyna
Rzepkowska, Agnieszka
This list is based on CrossRef data as of 12 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
