Get fulltext from our e-platform

Frequency, Dispersion, Association, and Keyness
Revising and tupleizing corpus-linguistic measures
This book is an attempt to revisit the main specifically corpus-linguistic statistics/measures the field has been relying on for decades: frequency, dispersion, association, and keyness. The book first discusses the purpose of these measures and how they have been measured. Then, the book makes three main proposals: First, that many measures of dispersion, association, and keyness are too confounded with frequency and how to 'take frequency out of them' to obtain conceptually cleaner and more interpretable measures. Second, that many existing measures can be replaced by the simple information-theoretic measure of the Kullback-Leibler divergence and that it, too, can have frequency 'removed' from it. Third, that corpus linguistics should abandon the tradition of trying to describe its findings with a single number and adopt a tupleization approach instead, where we use several separate dimensions of information for description and interpretation. The book is written in an informal, hands-on style and comes with its own R package featuring functions, example data, and several thousand lines of code exemplifying all applications.
[Studies in Corpus Linguistics, 115] 2024. vii, 321 pp.
Publishing status: Available
Published online on 19 June 2024
Published online on 19 June 2024
© John Benjamins
Table of Contents
- Chapter 1. Introduction | pp. 1–11
- Chapter 2. A review: Corpus statistics, the ‘usual’ approaches | pp. 12–79
- Chapter 3. Unification of measures | pp. 80–169
- Chapter 4. The role, and the ‘partialing out’, of frequency | pp. 170–228
- Chapter 5. Tupleization | pp. 229–268
- Chapter 6. What should be next | pp. 269–303
- Chapter 7. Conclusion | pp. 304–307
- References | pp. 308–318
- Index | pp. 319–321
“Frequency, Dispersion, Association, and Keyness is an articulate presentation of a number of important problems with corpus linguistic measures and how they are used. The solutions composed are a critical first step to addressing them and provide more avenues for research into dispersion, association, and keyness. This book is instructive for anyone using these measures to become better acquainted with them, including their problems and how we must be more careful in employing them.”
William C. X. Platt, Lancaster University, in International Journal of Corpus Linguistics, 30(3), 417-424 (2025).
Cited by (11)
Cited by 11 other publications
Gries, Stefan Th.
2025. Not just frequency. In Mathematical Modelling in Linguistics and Text Analysis [Current Issues in Linguistic Theory, 370], ► pp. 17 ff.
Gries, Stefan Th.
Gries, Stefan Th. & Stefanie Wulff
Hsiung, Nai-Huan, Chung-Fan Ni, Charles Silber, Justin Jacques & Cass Dykeman
Liao, Shengyu, Stefan Th. Gries & Stefanie Wulff
Schoonjans, Steven & Beatrix Schönherr
Zhan, Hongwei
Zhong, Yanlu, Simon Todd, Nicole Xu & Laurel Brehm
Hartmann, Stefan & Alexander Willich
This list is based on CrossRef data as of 3 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.