Article published In: Journal of Second Language Studies
Vol. 5:2 (2022) ► pp.171–205
What do (most of) our dispersion measures measure (most)? Dispersion?
Published online: 30 November 2021
https://doi.org/10.1075/jsls.21029.gri
https://doi.org/10.1075/jsls.21029.gri
Abstract
This paper discusses the degree to which most of the most widely-used measures of dispersion in corpus linguistics
are not particularly valid in the sense of actually measuring dispersion rather than some amalgam of a lot of frequency and a
little dispersion. The paper demonstrates these issues on the basis of data from a variety of corpora. I then outline how to
design a dispersion measure that only measures dispersion and show that (i) it indeed measures information that is different from
frequency in an intuitive way and (ii) has a higher degree of predictive power of lexical decision times from the MALD database
than nearly all other measures in nearly all corpora tested.
Keywords: dispersion, frequency, association, range, Juilland’s D, Gries’s DP, generalized additive modeling
Article outline
- 1.Introduction
- 2.A brief recap: G2 reacts more to frequency than to association
- 3.Dispersion measure: What do they measure and how?
- 3.1Existing measures
- 3.2A new measure: Motivation and development
- 3.3Perspective 1: DPnofreq measures dispersion, not frequency
- 3.4Perspective 2: DPnofreq helps predicting external data
- 4.Two short excurses
- 4.1Excursus 1 rangenofreq
- 4.2Excursus 2: fast bowler vs. fast food
- 5.Concluding remarks
- Notes
References
References (34)
Adelman, James S., Gordon D. A. Brown, & José F. Quesada. 2006. Contextual
Diversity, not word frequency, determines word-naming and lexical decision times. Psychological
Science 19(9). 814–823.
Baayen, R. Harald. 2008. Analyzing linguistic data: a
practical introduction to statistics with
R. Cambridge: Cambridge University Press.
. 2010. Demythologizing the word
frequency effect: A discriminative learning perspective. The Mental
Lexicon 5(3). 436–461.
Baayen, R. Harald, Petar Milin, & Michael Ramscar. 2016. Frequency
in lexical
processing. Aphasiaology 30(11). 1174–1220.
Balota, David A. & Daniel H. Spieler. 1998. The
utility of item level analyses in model evaluation: a reply to Seidenberg and
Plaut. Psychological
Science 9(3). 238–240.
Bestgen, Yves & Sylviane Granger. 2009. Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing 261. 28–41.
Brysbaert, Marc & Boris New. 2009. Moving
beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved
word frequency measure for American English. Behavior Research
Methods 41(4). 977–990.
Brysbaert, Marc, Pawel Mandera, Samantha F. McCormick, & Emmanuel Keuleers. 2019. Word prevalence norms for 62,000 English lemmas. Behavior Research Methods 511. 467–479.
Carroll, John B. 1970. An alternative to Juilland’s
usage coefficient for lexical frequencies and a proposal for a standard frequency
index. Computer Studies in the Humanities and Verbal
Behaviour 3(2). 61–65.
Durrant, Phil & Norbert Schmitt. 2009. To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics 471. 157–177.
Ellis, Nick C. 2007a. Language acquisition as
rational contingency learning. Applied
Linguistics 27(1). 1–24.
2007b. The Associative-Cognitive
CREED. In Bill VanPatten & Jessica Williams. (eds.), Theories
of second language acquisition: an
introduction, 77–95. Mahwah, NJ: Lawrence Erlbaum.
Ellis, Nick C., Rita Simpson-Vlach, & Carson Maynard. 2008. Formulaic language in native and second language speakers: Psycholinguistics, corpus linguistics, and TESOL. TESOL Quarterly 42(3). 375–396.
Evert, Stefan. 2009. Corpora
and collocations. In Anke Lüdeling & Merja. Kytö. (eds.), Corpus
Linguistics: An International
Handbook, Vol. 21, 1212–1248. Berlin & New York: Mouton de Gruyter.
Fu, M. & Shaofeng, Li. 2019. The associations between individual differences in working memory and the effectiveness of immediate and delayed corrective feedback. Journal of Second Language Studies 2(2). 233-257 (25)
Gries, Stefan Th. 2008. Dispersions and adjusted
frequencies in corpora. International Journal of Corpus
Linguistics 13(4). 403–437.
2010. Dispersions and adjusted
frequencies in corpora: further explorations. In Stefan Th. Gries, Stefanie Wulff, & Mark Davies. (eds.), Corpus
linguistic applications: current studies, new
directions, 197–212. Amsterdam: Rodopi.
2019a. Ten lectures on corpus-linguistic
approaches: Applications for usage-based and psycholinguistic research. Leiden & Boston: Brill.
2019b. 15 years of collostructions:
some long overdue additions/corrections (to/of actually all sorts of corpus-linguistics
measures). International Journal of Corpus
Linguistics 24(3). 385–412.
2020. Analyzing
dispersion. In Magali Paquot & Stefan Th. Gries. (eds.), A
practical handbook of corpus
linguistics, 99–118. Berlin & New York: Springer.
Gries, Stefan, Th. 2021. What do (some of) our association measures measure (most)? Association? Journal of Second Language Studies. Available online: 12 November 2021.
Juilland, Alphonse G., Dorothy R. Brodin, & Catherine Davidovitch. 1970. Frequency
dictionary of French words. The Hague: Mouton de Gruyter.
Kromer, Victor. 2003. An
usage measure based on psychophysical relations. Journal of Quantitative
Linguistics 10(2). 177–186.
Oakes, Michael P. & Malcolm Farrow. 2007. Use
of the Chi-Squared Test to examine vocabulary differences in English language corpora representing seven different
countries. Literary and Linguistic
Computing 22(1). 85–99.
Pecina, Pavel. 2009. Lexical
association measures and collocation extraction. Language Resources and
Evaluation 44(1–2). 137–158.
Robertson, Stephen. 2004. Understanding
Inverse Document Frequency: on theoretical arguments of IDF. Journal of
Documentation 60(5). 503–520.
Rosengren, Inger. 1971. The
quantitative concept of language and its relation to the structure of frequency
dictionaries. Études de linguistique appliquée (Nouvelle
Série) 11. 103–127.
Savický, Petr & Jaroslava Hlaváčová. 2002. Measures
of word commonness. Journal of Quantitative
Linguistics 9(3), 215–231.
Schmid, Hans Joerg. 2010. Entrenchment, salience, and
basic levels. In Dirk Geeraerts & Hubert Cuyckens. (eds.), The
Oxford Handbook of Cognitive
Linguistics, 117–138. Oxford: Oxford University Press.
Siyanova-Chanturia, Anna. 2015. Collocation in beginner learner writing: A longitudinal study. System 531. 148–160.
Spärck Jones, Karen. 1972. A
statistical interpretation of term specificity and its application in information
retrieval. Journal of
Documentation 28(1). 11–21.
Spieler, Daniel H. & David A. Balota. 1997. Bringing
computational models of word naming down to the item level. Psychological
Science 8(6). 411–416.
Cited by (15)
Cited by 15 other publications
Dubois, Tanguy, Magali Paquot & Benedikt Szmrecsanyi
Feltgen, Quentin
Flanagan, Joseph
2025. Reproducibility, replicability, robustness, and generalizability in corpus linguistics. International Journal of Corpus Linguistics 30:2 ► pp. 130 ff.
Gries, Stefan Th.
Gries, Stefan Th.
Th Gries, Stefan
Nelson, Robert N.
Platt, William C. X.
2025. Review of Gries (2024): Frequency, Dispersion, Association, and Keyness: Revising and tupleizing corpus-linguistic measures. International Journal of Corpus Linguistics 30:3 ► pp. 417 ff.
Shadrova, Anna
Sönning, Lukas
Wulff, Stefanie & Stefan Th. Gries
Jeaco, Stephen
2023. How can we communicate (visually) what we (usually) mean by collocation and keyness?. Journal of Second Language Studies 6:1 ► pp. 29 ff.
[no author supplied]
This list is based on CrossRef data as of 13 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
