Commentary published In: Journal of Second Language Studies
Vol. 6:1 (2023) ► pp.29–60
Commentary
How can we communicate (visually) what we (usually) mean by collocation and keyness?
A visual response to Gries (2022a)
Published online: 13 February 2023
https://doi.org/10.1075/jsls.22019.jea
https://doi.org/10.1075/jsls.22019.jea
Abstract
Corpus linguistic methods can now be easily employed in a wide range of studies within sub-disciplines of
linguistics and well beyond. In a two-part paper, Gries (Gries, S. (2022a). What
do (some of) our association measures measure (most)? Association? Journal of Second Language
Studies, 5(1). , (2022b). What
do (most of) our dispersion measures measure (most)? Dispersion? Journal of Second Language
Studies. ) challenges some of the most widely used ‘association measures’ of what many might feel to be
powerful aspects of text patterning: collocation and key words. While the additional association measure offers some new
possibilities, this paper highlights the strong influence of another frequency parameter on odds ratio and Gries’s suggested
association measure, and questions the applicability of his cautions for many different kinds of corpus research. Nevertheless,
having been inspired to look at different aspects of association and dispersion more carefully, the author presents some new
visualizations which were designed to communicate some of the important lessons to be learned from Gries’s papers, especially for
learners and teachers using corpus tools in Second Language classrooms.
Keywords: association, frequency, collocation, keyness, dispersion, log-likelihood, range, Gries’s DP
Article outline
- 1.Introduction
- 2.Definitions
- 3.Re-examining the roles of inputs for collocation measures
- 4.Keeping the Zipfian curve in mind for key words
- 5.Re-examining issues of dispersion measures
- 6.Conclusion
- Note
References
References (37)
Anthony, L. (2022). AntConc (Version
4.0.1). Tokyo, Japan: Waseda University. Retrieved from [URL]
Bestgen, Yves & Sylviane Granger. 2014. Quantifying
the development of phraseological competence in L2 English writing: An automated
approach. Journal of Second Language
Writing 261. 28–41.
Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations
in context: A new perspective on collocation networks. International Journal of Corpus
Linguistics, 20(2), 139–173.
BNC. (2007). The British National
Corpus (Version 3 BNC XML ed.): Oxford University Computing Services on behalf of the BNC Consortium. URL: [URL]
Croft, W. B., Metzler, D., & Strohman, T. (2010). Search
Engines: Information Retrieval in
Practice. Boston: Addison-Wesley.
Dunning, T. (1993). Accurate
methods for the statistics of surprise and coincidence. Computational
Linguistics, 19(1), 61–74.
Evert, Stefan & Brigitte Krenn. (2001). Methods
for the qualitative evaluation of lexical association measures. Proceedings of the 39th Annual
Meeting of the Association for Computational
Linguistics, p, 188–195.
Garside, R., & Smith, N. (1997). A
hybrid grammatical tagger: CLAWS4. In R. Garside, G. Leech & A. McEnery (Eds.), Corpus
Annotation: Linguistic Information from Computer Text
Corpora (pp. 102–121). London: Longman.
Gries, S. T. (2013). 50-something
years of work on collocations. International Journal of Corpus
Linguistics, 18(1), 137–165.
Gries. (2015). The most
under-used statistical method in corpus linguistics: multi-level (and mixed-effects)
models. Corpora, 10(1), 95–125.
Gries, S. (2022a). What
do (some of) our association measures measure (most)? Association? Journal of Second Language
Studies, 5(1).
(2022b). What
do (most of) our dispersion measures measure (most)? Dispersion? Journal of Second Language
Studies.
Hann, M. N. (1973). The
Statistical Force of Random Distribution. International Journal of Applied
Linguistics, 201, 31–44.
Hardie, A. (2012). CQPweb:
Combining Power, Flexibility and Usability in a Corpus Analysis Tool. International Journal of
Corpus
Linguistics, 17(3), 380–409.
Heaps, H. (1978). Information
retrieval: Computational and theoretical aspects. New York: Academic Press.
(2014). Words
and their neighbours. In J. R. Taylor (Ed.), Oxford
Handbook of the Word. Oxford: Oxford University Press.
Jeaco, S. (2017). Concordancing
Lexical Primings: The rationale and design of a user-friendly corpus tool for English language teaching and self-tutoring
based on the Lexical Priming theory of language. In M. Pace-Sigge & K. J. Patterson (Eds.), Lexical
Priming: Applications and
Advances (pp. 273–296). Amsterdam: John Benjamins.
(2020). Key
words when text forms the unit of study: Sizing up the effects of different
measures. International Journal of Corpus
Linguistics, 25(2), 125–154.
Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The
Sketch Engine. Paper presented at the 2003 International Conference on
Natural Language Processing and Knowledge
Engineering, Beijing.
O’Keeffe, A., McCarthy, M., & Carter, R. (2007). From
Corpus to Classroom: Language Use and Language
Teaching. Cambridge: Cambridge University Press.
Rayson, P., & Garside, R. (2000). Comparing
corpora using frequency profiling. Paper presented at the Workshop on
Comparing Corpora, Hong Kong University of Science and Technology, Hong
Kong.
Read, T. R. C., & Cressie, N. A. C. (1988). Goodness-of-fit
Statistics for Discrete Multivariate Data. New York: Springer-Verlag.
RStudio Team (2022). RStudio: Integrated
Development Environment for R. Boston, MA:PBC. Retrieved from [URL]
Rychlý, P. (2008). A
lexicographer-friendly association score. Paper presented at the Recent
Advances in Slavonic Natural Language Processing Conference, Masaryk
University, Brno.
Scott, M., & Tribble, C. (2006). Textual
Patterns: Key Words and Corpus Analysis in Language
Education. Amsterdam: John Benjamins.
(2022). WordSmith
Tools online manual “KeyWords: calculation”. Retrieved 31 October, 2022, from [URL]
(2004). New
evidence, new priorities, new attitudes. In J. M. Sinclair (Ed.), How
to Use Corpora in Language
Teaching (pp. 271–299). Amsterdam: John Benjamins.
Wermter, J., & Hahn, U. (2006). You
can’t beat frequency (unless you use linguistic knowledge): A qualitative evaluation of association measures for collocation
and term extraction. Paper presented at the Annual Meeting of the
Association for Computational Linguistics, Sydney.
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 13 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
