Article published In: International Journal of Corpus Linguistics
Vol. 23:1 (2018) ► pp.85–113
Collocation and word association
Comparing collocation measuring methods
Published online: 31 May 2018
https://doi.org/10.1075/ijcl.15116.kan
https://doi.org/10.1075/ijcl.15116.kan
Abstract
This paper studies the relationship between grammar and language use by comparing word association and collocation. Since word association reveals mental semantic knowledge, usage-based approaches expect word association to mirror the relation between words in use, namely collocation. The paragraph is a more apt unit for collocation than the sentence in mirroring word association. Among measures of collocation, (simple) log likelihood and t-score turn out to be more consistent with association, with log likelihood leading by a small margin over MI or MI3. Overall, word association and collocation are quite close, but not perfectly close because of differences in relevant resources and the characteristics of lexical/semantic relations.
Keywords: word association, collocation, rank measure, semantic relations, language use
Article outline
- 1.Introduction
- 2.Collocation and word association
- 3.Methodology
- 4.Collocation in paragraphs and sentences
- 5.Parts of speech and collocation
- 6.Frequency of words and collocation
- 7.Lexical and semantic relations in word association and collocation
- 8.Conclusion
- Acknowledgements
- Notes
References
References (41)
Baroni, M., & Evert, S. (2009). Statistical methods for corpus exploitation. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook, Vol.21 (pp 777–803). Berlin: Walter de Gruyter.
Bednarek, M. (2008). Semantic preferences and semantic prosody re-examined. Corpus Linguistics and Linguistic Theory, 4(2), 119–139.
Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20(2), 139–173.
Burger, H., Dobrovol’skij, D., Kühn, P., & Norrick, N. R. (2007). Phraseology: Subject area, terminology and research topics. In H. Burger, D. Dobrovol’skij, P. Kühn & N. R. Norrick (Eds.), Phraseology: An International Handbook of Contemporary Research (pp. 10–19). Berlin: Walter de Gruyter.
Church, K. W., & Hanks, P. (1990). Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1), 22–29.
Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.
Durrant, P., & Doherty, A. (2010). Are high-frequency collocations psychologically real? Investing the thesis of collocational priming. Corpus Linguistics and Linguistic Theory, 6(2), 125–155.
Evert, S. (2009). Corpora and collocations. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook, Vol.2 (pp. 1212–1248). Berlin: Walter de Gruyter.
Firth, J. R. (1957). Modes of meaning. In J. R. Firth, Papers in Linguistics. London: Oxford University Press.
Glynn, D. (2010). Synonymy, lexical fields, and grammatical constructions: A study in usage-based cognitive semantics. In H. -J. Schmid & S. Handl (Eds.), Cognitive Foundations of Usage Patterns (pp. 89–117). Berlin: De Gruyter Mouton.
Glynn, D., & Robinson, J. A. (Eds.) (2014). Corpus Methods for Semantics: Quantitative Studies in Polysemy and Synonymy. Amsderdam/Philadelphia: John Benjamins.
Goldberg, A. E. (2006). Constructions at Work: The Nature of Generalizations in Language. Oxford: Oxford University Press.
Gries, S. Th. (2013). 50-something years of work on collocation: What is or should be next … International Journal of Corpus Linguistics, 18(1), 137–165.
Hunston, S., & Francis, G. (1999). Pattern Grammar: A Corpus-Driven Approach to the Lexical Grammar of English. Amsterdam/Philadelphia: John Benjamins.
Kent, G. H., & Rosanoff, A. J. (1910) A study of association in insanity. American Journal of Insanity, 67(2), 317-390.
Kiss, G., Armstrong, C., Milroy, R., & Piper, J. (1973). An associative thesaurus of English and its computer analysis. In A. J. Aitken, R. W. Bailey & N. Hamilton-SMmith (Eds.), The Computer and Literary Studies (pp. 153–165). Edinburgh: Edinburgh University Press.
Langacker, R. W. (1987). Foundations of Cognitive Grammar, Vol 1: Theoretical Prerequisites. Stanford: Stanford University Press.
Leech, G., Rayson, P., & Wilson, A. (2001). Word Frequencies in Written and Spoken English: Based on the British National Corpus. London: Longman.
Louw, B. (1993). Irony in the text or insincerity in the writer. In M. Baker, G. Francis & T. Tognini-Bonelli (Eds.), Text and Technology: In Honor of John Sinclair (pp. 157–176). Amsterdam/Philadelphia: John Benjamins.
Manning, C. D., & Schütze, H. (1999). Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press.
McEnery, T., & Hardie, A. (2012). Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.
McGee, I. (2009). Adjective-noun collocations in elicited corpus data: Similarities, differences, and the whys and wherefores. Corpus Linguistics and Linguistic Theory, 5(1), 79–103.
Michelbacher, L., Evert, S., & Schütze, H. (2011). Asymmetry in corpus-derived and human word associations. Corpus Linguistics and Linguistic Theory, 7(2), 245–276.
Mollin, S. (2009). Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations. Corpus Linguistics and Linguistic Theory, 5(2), 175–200.
Padó, S., & Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics, 33(2), 161–199.
Rapp, R. (2002). The computation of word associations: Comparing syntagmatic and paradigmatic approaches. In S. C. Tseng, T. E. Chen & Y. E. Liu (Eds.), COLING 2002: Proceedings of the 19th International Conference on Computational Linguistic 1 (pp. 1–7). Taipei: Howard International House.
Schulte im Walde, S., Melinger, A., Roth, M., & Weber, A. (2008). An empirical characterisation of response types in German association norms. Research on Language and Computation, 6(2), 205–238.
Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam/Philadelphia: John Bejamins.
Spence, D. P., & Owens, K. C. (1990). Lexical co-occurrence and association norms. Journal of Psycholinguistic Research, 19(5), 317–330.
Stubbs, M. (1995). Collocations and semantic prosodies: On the cause of the trouble with quantitative studies. Foundations of Language, 2(1), 23–55.
Tummers, J., Heylen, K., & Geeraerts, D. (2005). Usage-based approaches in Cognitive Linguistics: A technical state of the art. Corpus Linguistics and Linguistic Theory, 1(2), 225–261.
Cited by (11)
Cited by 11 other publications
Vandeskog, Hilde Ousland & Jan Buts
Naismith, Ben & Alan Juffs
Su, Qi, Chen Gu & Pengyuan Liu
2024. Association measures for collocation extraction. International Journal of Corpus Linguistics 29:1 ► pp. 59 ff.
Raet, Mai
Periñán-Pascual, Carlos
Yan, Hengbin & Yinghui Li
Fitzpatrick, Tess & Peter Thwaites
Thwaites, Peter
2020. Does verb transitivity influence word association responses?. The Mental Lexicon 15:3 ► pp. 464 ff.
Thwaites, Peter
2022. Exploring the impact of lexical context on word association responses. International Journal of Corpus Linguistics 27:3 ► pp. 321 ff.
Jeaco, Stephen
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
