Frequency, Dispersion, Association, and Keyness

Revising and tupleizing corpus-linguistic measures

Stefan Th. Gries | University of California, Santa Barbara | Justus Liebig University Giessen

Hardbound – Available

ISBN 9789027214928 | EUR 115.00 | USD 149.00

e-Book –

ISBN 9789027246813 | EUR 115.00 | USD 149.00

Get fulltext from our e-platform

Download Book PDF

Download Book EPUB

This book is an attempt to revisit the main specifically corpus-linguistic statistics/measures the field has been relying on for decades: frequency, dispersion, association, and keyness. The book first discusses the purpose of these measures and how they have been measured. Then, the book makes three main proposals: First, that many measures of dispersion, association, and keyness are too confounded with frequency and how to 'take frequency out of them' to obtain conceptually cleaner and more interpretable measures. Second, that many existing measures can be replaced by the simple information-theoretic measure of the Kullback-Leibler divergence and that it, too, can have frequency 'removed' from it. Third, that corpus linguistics should abandon the tradition of trying to describe its findings with a single number and adopt a tupleization approach instead, where we use several separate dimensions of information for description and interpretation. The book is written in an informal, hands-on style and comes with its own R package featuring functions, example data, and several thousand lines of code exemplifying all applications.

[Studies in Corpus Linguistics, 115] 2024. vii, 321 pp.

Publishing status: Available
Published online on 19 June 2024

https://doi.org/10.1075/scl.115

Table of Contents

Chapter 1. Introduction | pp. 1–11

Chapter 2. A review: Corpus statistics, the ‘usual’ approaches | pp. 12–79

Chapter 3. Unification of measures | pp. 80–169

Chapter 4. The role, and the ‘partialing out’, of frequency | pp. 170–228

Chapter 5. Tupleization | pp. 229–268

Chapter 6. What should be next | pp. 269–303

Chapter 7. Conclusion | pp. 304–307

References | pp. 308–318

Index | pp. 319–321

“Frequency, Dispersion, Association, and Keyness is an articulate presentation of a number of important problems with corpus linguistic measures and how they are used. The solutions composed are a critical first step to addressing them and provide more avenues for research into dispersion, association, and keyness. This book is instructive for anyone using these measures to become better acquainted with them, including their problems and how we must be more careful in employing them.”

William C. X. Platt, Lancaster University, in International Journal of Corpus Linguistics, 30(3), 417-424 (2025).

Cited by (11)

Cited by 11 other publications

Order by:

Gries, Stefan Th.

2025. Not just frequency. In Mathematical Modelling in Linguistics and Text Analysis [Current Issues in Linguistic Theory, 370], ► pp. 17 ff.

Gries, Stefan Th.

2025. Cultural Keywords in Varieties Research. Journal of Research Design and Statistics in Linguistics and Communication Science

Gries, Stefan Th. & Stefanie Wulff

2025. Introduction to the special issue on collostructions. Corpus Linguistics and Linguistic Theory 21:3 ► pp. 465 ff.

Hsiung, Nai-Huan, Chung-Fan Ni, Charles Silber, Justin Jacques & Cass Dykeman

2025. Filial Care in Transition: Linguistic and Emotional Patterns in Online Discourse Among Emerging Adults in Taiwan. Behavioral Sciences 15:10 ► pp. 1417 ff.

Liao, Shengyu, Stefan Th. Gries & Stefanie Wulff

2025. Transfer five ways: applications of multiple distinctive collexeme analysis to the dative alternation in Mandarin Chinese. Corpus Linguistics and Linguistic Theory 21:3 ► pp. 517 ff.

Schoonjans, Steven & Beatrix Schönherr

2025. Frequency issues in multimodal Construction Grammar revisited. Nota Bene 2:1 ► pp. 200 ff.

Soenning, Lukas

2025. ,

Zhan, Hongwei

2025. Key cluster identification in literary texts using and comparing multiple measures: an exploratory comparative study and its implications. Digital Scholarship in the Humanities 40:2 ► pp. 668 ff.

Zhong, Yanlu, Simon Todd, Nicole Xu & Laurel Brehm

2025. Evaluating LLMs as proxies for humans in psycholinguistic ratings: A comparison of statistical knowledge. Research Methods in Applied Linguistics 4:3 ► pp. 100274 ff.

Hartmann, Stefan & Alexander Willich

2024. Collostructional Analysis Meets Construction Semantics: Revisiting the English Way-Construction and Its German Equivalents. Zeitschrift für Anglistik und Amerikanistik 72:3 ► pp. 319 ff.

Uhrig, Peter & Thomas Herbst

2024. How Collostructional Analysis Contributes to the Description of Argument Structure Constructions with Slots for that- and Infinitive Clauses. Zeitschrift für Anglistik und Amerikanistik 72:3 ► pp. 213 ff.

This list is based on CrossRef data as of 3 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

Subjects and metadata

LoC, MARC XML

U.S. Library of Congress Control Number: 2024023008 | Marc record

Frequency, Dispersion, Association, and Keyness

Revising and tupleizing corpus-linguistic measures

Cited by 11 other publications

Linguistics

Main BIC Subject

Main BISAC Subject

ONIX Metadata

VPAT

LoC, MARC XML