Article published In: Journal of Second Language Studies
Vol. 5:1 (2022) ► pp.1–33
What do (some of) our association measures measure (most)? Association?
Published online: 12 November 2021
https://doi.org/10.1075/jsls.21028.gri
https://doi.org/10.1075/jsls.21028.gri
Abstract
This paper discusses the degree to which some of the most widely-used measures of association in corpus
linguistics are not particularly valid in the sense of actually measuring association rather than some amalgam of a lot of
frequency and a little association. The paper demonstrates these issues on the basis of hypothetical and actual corpus data and
outlines implications of the findings. I then outline how to design an association measure that only measures association and show
that its behavior supports the use of the log odds ratio as a true association-only measure but separately from frequency; in
addition, this paper sets the stage for an analogous review of dispersion measures in corpus linguistics.
Keywords: association, frequency, dispersion, log-likelihood, t, MI, generalized additive modeling
Article outline
- 1.Introduction
- 2.The conflation of frequency and association
- 2.1Hypothetical data and G2’s behavior
- 2.1.1A collocation/collostruction example
- 2.1.2A keyness example
- 2.2Actual data and G2’s behavior
- 2.3Actual data and t’s behavior
- 2.4Actual data and MI’s behavior
- 2.1Hypothetical data and G2’s behavior
- 3.Interim discussion
- 3.1Some general remarks
- 3.2MI, MI2, and MI3
- 3.3A brief comment on (log) Dice
- 4.A new measure
- 4.1Motivation and development
- 4.2Application to fast
- 5.A small excursus
- 6.Concluding remarks
- Notes
References
References (27)
Baayen, R. Harald, Petar Milin, & Michael Ramscar. 2016. Frequency
in lexical
processing. Aphasiaology 30(11). 1174–1220.
Bestgen, Yves & Sylviane Granger. 2014. Quantifying
the development of phraseological competence in L2 English writing: An automated
approach. Journal of Second Language
Writing 261. 28–41.
Chruch, Kenneth W. & Patrick Hanks. 1993. Word
association norms, mutual information, and lexicography. Computational
Linguistics 16(1). 22–29.
Dunning, Ted. 1993. Accurate
methods for the statistics of surprise and coincidence. Computational
Linguistics 19(1), 61–74.
Durrant, Phil. 2014. Corpus
frequency and second language learners’ knowledge of collocations. International Journal of
Corpus
Linguistics 19(4). 443–477.
Durrant, Phil & Norbert Schmitt. 2009. To
what extent do native and non-native writers make use of collocations? Internationak Review of
Applied Linguistics 471. 157–177.
Ellis, Nick C. 2007a. Language acquisition as
rational contingency learning. Applied
Linguistics 27(1). 1–24.
2007b. The Associative-Cognitive
CREED. In Bill VanPatten & Jessica Williams (eds.), Theories
of second language acquisition: an
introduction, 77–95. Mahwah, NJ: Lawrence Erlbaum.
Ellis, Nick C., Rita Simpson-Vlach, & Carson Maynard. 2008. Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly 42(3). 375–396.
Evert, Stefan. 2009. Corpora
and collocations. In Anke Lüdeling & Merja. Kytö (eds.), Corpus
Linguistics: An International
Handbook, Vol. 21, 1212–1248. Berlin & New York: Mouton de Gruyter.
Evert, Stefan & Brigitte Krenn. 2001. Methods for the qualitative evaluation of lexical association measures. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, p, 188–195.
Groom, Nicholas. 2009. Effects
of second language immersion on second language collocational
development. In Andy Barfield & Henrik Gyllstad (eds.), Researching
collocations in another
language, 21–33. Basingstoke, UK: Palgrave Macmillan.
Gries, Stefan Th. 2008. Dispersions and adjusted
frequencies in corpora. International Journal of Corpus
Linguistics 13(4). 403–437.
2010. Dispersions and adjusted
frequencies in corpora: further explorations. In Stefan Th. Gries, Stefanie Wulff, & Mark Davies (eds.), Corpus
linguistic applications: current studies, new
directions, 197–212. Amsterdam: Rodopi.
2013. 50-something years of work on
collocations: what is or should be next … International Journal of Corpus
Linguistics 18(1). 137–165.
2019a. Ten lectures on corpus-linguistic
approaches: Applications for usage-based and psycholinguistic research. Leiden & Boston: Brill.
2019b. 15 years of collostructions:
some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics
measures). International Journal of Corpus
Linguistics 24(3). 385–412.
2020. Analyzing
dispersion. In Magali Paquot & Stefan Th. Gries (eds.), A
practical handbook of corpus
linguistics, 99–118. Berlin & New York: Springer.
Gries, Stefan Th. 2021. A new approach to (key) keywords analysis: using
frequency, and now also dispersion. Research in Corpus
Linguistics 9(2). 1–33.
Gries, Stefan Th. 2022. What do (some of) our
dispersion measures measure (most)? Dispersion? Journal of Second Language
Studies.
Pecina, Pavel. 2009. Lexical
AMs and collocation extraction. Language Resources and
Evaluation 44(1–2). 137–158.
Savický, Petr & Jaroslava Hlaváčová. 2002. Measures
of word commonness. Journal of Quantitative
Linguistics 9(3), 215–231.
Schmid, Hans Joerg. 2010. Entrenchment, salience, and
basic levels. In Dirk Geeraerts & Hubert Cuyckens (eds.), The
Oxford Handbook of Cognitive
Linguistics, 117–138. Oxford: Oxford University Press.
Siyanova-Chanturia, Anna. 2015. Collocation
in beginner learner writing: A longitudinal
study. System 531. 148–160.
Stubbs, Michael. 1995. Collocations
and semantic profiles: on the cause of the trouble with quantitative methods. Functions of
Language 2(1). 23–55.
Thanopoulos, Aristomenis, Nikos Fakotakis, & George Kokkinakis. 2002. Comparative
Evaluation of Collocation Extraction Metrics. Paper presented
at LREC 2002.
Cited by (25)
Cited by 25 other publications
Bozdağ, Fatih Ünal
Daugs, Robert & David Lorenz
Flach, Susanne
Li, Jingjie & Wenjie Hu
Liao, Shengyu, Stefan Th. Gries & Stefanie Wulff
Olguin Martinez, Jesus & Stefan Th. Gries
2025. The similative-pretence alternating pair and filler-slot relations. Constructions and Frames 17:1 ► pp. 65 ff.
Paquot, Magali & Hubert Naets
2025. Phraseological sophistication as a multidimensional construct. International Journal of Learner Corpus Research 11:1 ► pp. 217 ff.
Platt, William C. X.
2025. Review of Gries (2024): Frequency, Dispersion, Association, and Keyness: Revising and tupleizing corpus-linguistic measures. International Journal of Corpus Linguistics 30:3 ► pp. 417 ff.
Rubin, Rachel, Bram Bulté, Magali Paquot & Alex Housen
Zhong, Yanlu, Simon Todd, Nicole Xu & Laurel Brehm
Bardenstein, Ruti & Mira Ariel
Cuberos Vicente, Rocío, Elisa Rosado Villegas & Iban Mañas Navarrete
2024. Towards a graded lexical inventory of multi-word combinations. ITL - International Journal of Applied Linguistics 175:1 ► pp. 46 ff.
Hoang, Hien & Peter Crosthwaite
Hougham, Dan, Jon Clenton & Takumi Uchihara
Hougham, Dan, Jon Clenton, Takumi Uchihara & George Higginbotham
Schäfer, Roland & Ulrike Sayatz
Suethanapornkul, Sakol & Sarut Supasiraprapa
Yi, Wei & Yanlu Zhong
Eguchi, Masaki & Kristopher Kyle
Jeaco, Stephen
2023. How can we communicate (visually) what we (usually) mean by collocation and keyness?. Journal of Second Language Studies 6:1 ► pp. 29 ff.
Ballance, Oliver James
Gries, Stefan Th.
2022. What do (some of) our association measures measure (most)? Association?. Journal of Second Language Studies 5:1 ► pp. 1 ff.
Gries, Stefan Th.
Gries, Stefan Th.
2022. What do (most of) our dispersion measures measure (most)? Dispersion?. Journal of Second Language Studies 5:2 ► pp. 171 ff.
[no author supplied]
This list is based on CrossRef data as of 13 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
