In:Current Issues in Phraseology
Edited by Sebastian Hoffmann, Bettina Fischer-Starcke and Andrea Sand
[Benjamins Current Topics 74] 2015
► pp. 135–164
50-something years of work on collocations
What is or should be next …
Published online: 10 July 2015
https://doi.org/10.1075/bct.74.07gri
https://doi.org/10.1075/bct.74.07gri
This paper explores ways in which research into collocation should be improved. After a discussion of the parameters underlying the notion of collocation, the paper has three main parts. First, I argue that corpus linguistics would benefit from taking more seriously the understudied fact that collocations are not necessarily symmetric, as most association measures imply. Also, I introduce an association measure from the associative learning literature that can identify asymmetric collocations and show that it can also distinguish collocations with high and low association strengths well. Second, I summarize some advantages of this measure and brainstorm about ways in which it can help re-examine previous studies as well as support further applications. Finally, I adopt a broader perspective and discuss a variety of ways in which all association measures – directional or not – in corpus linguistics should be improved in order for us to obtain better and more reliable results.
Keywords: association measure, collocation, directionality, dispersion, DP (delta P)
References (38)
Baayen, R.H. 2010. “Demythologizing the word frequency effect: A discriminative learning perspective”. The Mental Lexicon, 5 (3), 436–461.
Bartsch, S. 2004. Structural and Functional Properties of Collocations in English: A Corpus Study of Lexical and Pragmatic Constraints on Lexical Co-occurrence. Tübingen: Gunter Narr.
Bell, A., Brenier, J.M., Gregory, M., Girand, C. & Jurafsky, D. 2009. “Predictability effects on durations of content and function words in conversational English”. Journal of Memory and Language, 60 (1), 92–111.
Daudaravičius, V. & Marcinkevičienė, R. 2004. “Gravity counts for the boundaries of collocations”. International Journal of Corpus Linguistics, 9 (2), 321–348.
Ellis, N.C. 2007. “Language acquisition as rational contingency learning”. Applied Linguistics, 27 (1), 1–24.
Ellis, N.C. & Ferreira-Junior, F. 2009. “Constructions and their acquisition: Islands and the distinctiveness of their occupancy”. Annual Review of Cognitive Linguistics, 7, 187–220.
Evert, S. 2005. The Statistics of Word Co-occurrences: Word Pairs and Collocations. Ph.D. thesis. Stuttgart: University of Stuttgart.
. 2009. “Corpora and collocations”. InA. Lüdeling & M. Kytö(Eds.), Corpus Linguistics: An International Handbook, Vol. 2. Berlin/New York: Mouton de Gruyter, 1212–1248.
Ferraresi, A. & Gries, St. Th. 2011. “Type and (?) token frequencies in measures of collocational strength: Lexical gravity vs. a few classics”. Paper presented at
Corpus Linguistics 2011
,
University of Birmingham, UK
.
Firth, J.R. 1957. “A synopsis of linguistic theory 1930–1955”. InF. Palmer(Ed.), Selected Papers of J. R. Firth 1952–1959. London: Longman, 168–205.
Gries, St. Th. 2001. “A corpus-linguistic analysis of -ic and -ical adjectives”. ICAME Journal, 25, 65–108.
. 2003. “Testing the sub-test: A collocational-overlap analysis of English -ic and -ical adjectives”. International Journal of Corpus Linguistics, 8 (1), 31–61.
. 2008a. “Phraseology and linguistic theory: A brief survey”. InS. Granger & F. Meunier(Eds.), Phraseology: An Interdisciplinary Perspective. Amsterdam: John Benjamins, 3–25.
. 2008b. “Dispersions and adjusted frequencies in corpora”. International Journal of Corpus Linguistics, 13 (4), 403–437.
. 2010a. “Dispersions and adjusted frequencies in corpora: Further explorations”. InS. Th. Gries, S. Wulff & M. Davies(Eds.), Corpus Linguistic Applications: Current Studies, New Directions. Amsterdam: Rodopi, 197–212.
. 2010b: online. “Bigrams in registers, domains, and varieties: A bigram gravity approach to the homogeneity of corpora”. InM. Mahlberg, V. González-Diaz & C. Smith(Eds.), Proceedings of the Corpus Linguistics Conference (CL 2009),
University of Liverpool, UK
, 20–23 July 2009.Available at: [URL] (accessedJuly 2012).
. 2012. “Corpus linguistics, theoretical linguistics, and cognitive/psycholinguistics: Towards more and more fruitful exchanges”. InJ. Mukherjee & M. Huber(Eds.), Corpus Linguistics and Variation in English: Theory and Description. Amsterdam: Rodopi, 41–63.
Gries, St. Th., Hampe, B. & Schönefeld, D. 2005. “Converging evidence: Bringing together
experimental and corpus data on the association of verbs and constructions”. Cognitive Linguistics, 16 (4), 635–676.
Handl, S. 2008. “Essential collocations for learners of English: The role of collocational direction and weight”. InF. Meunier & S. Granger(Eds.), Phraseology in Foreign Language Learning and Teaching. Amsterdam: John Benjamins, 43–66.
Jelinek, F. 1990. “Self-organized language modeling for speech recognition”. InA. Waibel & K.-F. Lee(Eds.), Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann, 450–506.
Kilgarriff, A. 2009. “Simple maths for keywords”. Paper presented at
Corpus Linguistics 2009
,
University of Liverpool
.
Kjellmer, G. 1991. “A mint of phrases”. InK. Aijmer & B. Altenberg(Eds.), English Corpus Linguistics: Studies in Honor of Jan Svartvik. London: Longman, 111–127.
McGee, I. 2009. “Adjective-noun collocations in elicited and corpus data: Similarities, differences, and the whys and wherefores”. Corpus Linguistics and Linguistic Theory, 5 (1), 79–103.
Michelbacher, L., Evert, S. & Schütze, H. 2007. “Asymmetric association measures”. Paper presented at the
6th International Conference on Recent Advances in Natural Language Processing
,
Borovets, Bulgaria
.
. 2011. “Asymmetry in corpus-derived and human word associations”. Corpus Linguistics and Linguistic Theory, 7 (2), 245–276.
Mollin, S. 2009. “Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations”. Corpus Linguistics and Linguistic Theory, 5 (2), 175–200.
Nordquist, D. 2009. “Investigating elicited data from a usage-based perspective”. Corpus Linguistics and Linguistic Theory, 5 (1), 105–130.
Pecina, P. 2009. “Lexical association measures and collocation extraction”. Language Resources and Evaluation, 44 (1–2), 137–158.
Pedersen, T. 1998. “Dependent bigram identification”. In
Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98)
, July 28–30, 1197.
R Development Core Team. 2012: online. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Available at: [URL] (accessedJuly 2012).
Raymond, W.D. & Brown, E.L. 2012. “Are effects of word frequency effects of context of use? An analysis of initial fricative reduction in Spanish”. InSt. Th. Gries & D.S. Divjak(Eds.), Frequency Effects in Language Learning and Processing. Berlin/New York: Mouton de Gruyter, 35–52.
Smadja, F. 1993. “Retrieving collocations from text: Xtract”. Computational Linguistics, 19 (1), 143–177.
Stubbs, M. 2001. Words and Phrases: Corpus Studies of Lexical Semantics. Oxford/Malden, MA: Blackwell.
Wahl, A.R. 2011. “Intonation unit boundaries and the entrenchment of collocations: Evidence from bidirectional and directional association measures”. Unpublished ms, Department of Linguistics, University of California, Santa Barbara.
Cited by (6)
Cited by six other publications
Platt, William C. X.
2025. Review of Gries (2024): Frequency, Dispersion, Association, and Keyness: Revising and tupleizing corpus-linguistic measures. International Journal of Corpus Linguistics 30:3 ► pp. 417 ff.
Rastelli, Stefano
Rastelli, Stefano
Rastelli, Stefano & Akira Murakami
Smith, Chris A.
This list is based on CrossRef data as of 11 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
