How can we communicate (visually) what we (usually) mean by collocation and keyness?: A visual response to Gries (2022a)

Jeaco, Stephen

doi:10.1075/jsls.22019.jea

Commentary published In: Journal of Second Language Studies
Vol. 6:1 (2023) ► pp.29–60

Get fulltext from our e-platform

Download PDF

Download EPUB

Commentary

How can we communicate (visually) what we (usually) mean by collocation and keyness?

A visual response to Gries (2022a)

Stephen Jeaco | Xi’an Jiaotong-Liverpool University, China

Published online: 13 February 2023

https://doi.org/10.1075/jsls.22019.jea

Abstract

Corpus linguistic methods can now be easily employed in a wide range of studies within sub-disciplines of linguistics and well beyond. In a two-part paper, Gries (, ) challenges some of the most widely used ‘association measures’ of what many might feel to be powerful aspects of text patterning: collocation and key words. While the additional association measure offers some new possibilities, this paper highlights the strong influence of another frequency parameter on odds ratio and Gries’s suggested association measure, and questions the applicability of his cautions for many different kinds of corpus research. Nevertheless, having been inspired to look at different aspects of association and dispersion more carefully, the author presents some new visualizations which were designed to communicate some of the important lessons to be learned from Gries’s papers, especially for learners and teachers using corpus tools in Second Language classrooms.

Keywords: association, frequency, collocation, keyness, dispersion, log-likelihood, range, Gries’s DP

Article outline

1.Introduction
2.Definitions
3.Re-examining the roles of inputs for collocation measures
4.Keeping the Zipfian curve in mind for key words
5.Re-examining issues of dispersion measures
6.Conclusion
Note
References

References (37)

References

Anthony, L. (2022). AntConc (Version 4.0.1). Tokyo, Japan: Waseda University. Retrieved from [URL]

Bestgen, Yves & Sylviane Granger. 2014. Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing 261. 28–41.

Brezina, V., McEnery, T., & Wattam, S. (2015). Collocations in context: A new perspective on collocation networks. International Journal of Corpus Linguistics, 20(2), 139–173.

BNC. (2007). The British National Corpus (Version 3 BNC XML ed.): Oxford University Computing Services on behalf of the BNC Consortium. URL: [URL]

Croft, W. B., Metzler, D., & Strohman, T. (2010). Search Engines: Information Retrieval in Practice. Boston: Addison-Wesley.

Dunning, T. (1993). Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 19(1), 61–74.

Evert, Stefan & Brigitte Krenn. (2001). Methods for the qualitative evaluation of lexical association measures. Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, p, 188–195.

Garside, R., & Smith, N. (1997). A hybrid grammatical tagger: CLAWS4. In R. Garside, G. Leech & A. McEnery (Eds.), Corpus Annotation: Linguistic Information from Computer Text Corpora (pp. 102–121). London: Longman.

Gries, S. T. (2013). 50-something years of work on collocations. International Journal of Corpus Linguistics, 18(1), 137–165.

Gries. (2015). The most under-used statistical method in corpus linguistics: multi-level (and mixed-effects) models. Corpora, 10(1), 95–125.

Gries, S. (2022a). What do (some of) our association measures measure (most)? Association? Journal of Second Language Studies, 5(1).

(2022b). What do (most of) our dispersion measures measure (most)? Dispersion? Journal of Second Language Studies.

Hann, M. N. (1973). The Statistical Force of Random Distribution. International Journal of Applied Linguistics, 201, 31–44.

Hardie, A. (2012). CQPweb: Combining Power, Flexibility and Usability in a Corpus Analysis Tool. International Journal of Corpus Linguistics, 17(3), 380–409.

Heaps, H. (1978). Information retrieval: Computational and theoretical aspects. New York: Academic Press.

Hoey, M. (2005). Lexical Priming: A New Theory of Words and Language. London: Routledge.

(2014). Words and their neighbours. In J. R. Taylor (Ed.), Oxford Handbook of the Word. Oxford: Oxford University Press.

Hunston, S. (2002). Corpora in Applied Linguistics. Cambridge: Cambridge University Press.

Jeaco, S. (2017). Concordancing Lexical Primings: The rationale and design of a user-friendly corpus tool for English language teaching and self-tutoring based on the Lexical Priming theory of language. In M. Pace-Sigge & K. J. Patterson (Eds.), Lexical Priming: Applications and Advances (pp. 273–296). Amsterdam: John Benjamins.

(2020). Key words when text forms the unit of study: Sizing up the effects of different measures. International Journal of Corpus Linguistics, 25(2), 125–154.

Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The Sketch Engine. Paper presented at the 2003 International Conference on Natural Language Processing and Knowledge Engineering, Beijing.

Mahlberg, M. (2013). Corpus stylistics and Dickens’s fiction: New York; Routledge, 2013.

Oakes, M. P. (1998). Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.

O’Keeffe, A., McCarthy, M., & Carter, R. (2007). From Corpus to Classroom: Language Use and Language Teaching. Cambridge: Cambridge University Press.

Rayson, P., & Garside, R. (2000). Comparing corpora using frequency profiling. Paper presented at the Workshop on Comparing Corpora, Hong Kong University of Science and Technology, Hong Kong.

Read, T. R. C., & Cressie, N. A. C. (1988). Goodness-of-fit Statistics for Discrete Multivariate Data. New York: Springer-Verlag.

RStudio Team (2022). RStudio: Integrated Development Environment for R. Boston, MA:PBC. Retrieved from [URL]

Rychlý, P. (2008). A lexicographer-friendly association score. Paper presented at the Recent Advances in Slavonic Natural Language Processing Conference, Masaryk University, Brno.

Scott, M. (1997). PC analysis of key words – and key key words. System, 25(2), 233–245.

Scott, M., & Tribble, C. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. Amsterdam: John Benjamins.

Scott, M. (2020). WordSmith Tools (Version 8). Oxford: Oxford University Press.

(2022). WordSmith Tools online manual “KeyWords: calculation”. Retrieved 31 October, 2022, from [URL]

Sinclair, J. M. (1991). Corpus, Concordance, Collocation. Oxford: Oxford University Press.

(2004). New evidence, new priorities, new attitudes. In J. M. Sinclair (Ed.), How to Use Corpora in Language Teaching (pp. 271–299). Amsterdam: John Benjamins.

Wermter, J., & Hahn, U. (2006). You can’t beat frequency (unless you use linguistic knowledge): A qualitative evaluation of association measures for collocation and term extraction. Paper presented at the Annual Meeting of the Association for Computational Linguistics, Sydney.

Wood, S. N. (2011). Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. Journal of the Royal Statistical Society (B) 73(1):3–36.

Zipf, G. K. (1935). The Psycho-Biology of Language: An Introduction to Dynamic Philology. Boston, MA: Houghton Mifflin.

Cited by (1)

Cited by one other publication

Huo, Jing & Stephen Jeaco

2024. Using The Prime Machine to Untangle the Patterns of Academic Paraphrases. In English for Academic Purposes in the EMI Context in Asia, ► pp. 301 ff.

This list is based on CrossRef data as of 13 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.