In:Frequency, Dispersion, Association, and Keyness: Revising and tupleizing corpus-linguistic measures
Stefan Th. Gries
[Studies in Corpus Linguistics 115] 2024
► pp. v–viii
Published online: 4 July 2024
https://doi.org/10.1075/scl.115.toc
https://doi.org/10.1075/scl.115.toc
Table of contents
Chapter 1Introduction1
Chapter 2A review: Corpus statistics, the ‘usual’ approaches12
2.1Frequency13
2.1.1How frequency is measured13
2.1.2What frequency is important for19
2.2Dispersion23
2.2.1How dispersion is measured24
2.2.2What dispersion is important for (corpus-linguistically)32
2.2.3What dispersion is important for (conceptually)39
2.2.4A (very brief!) first reaction to Nelson (2023)43
2.3Association46
2.3.1How association is measured47
2.3.2What association is important for66
2.4Keywords68
2.4.1How keyness is measured69
2.4.2What keyness is important for78
Chapter 3Unification of measures80
3.1Introduction: The Kullback-Leibler divergence80
3.2Characteristics of the KLD87
3.2.1Directionality88
3.2.2Independence of sample size89
3.2.3Normalization90
3.2.4Smoothing92
3.3Background and interpretation99
3.4Application to dispersion104
3.4.1A first simple example104
3.4.2A more realistic example108
3.4.3A (very brief!) second reaction to Nelson (2023)113
3.5Application to association115
3.5.1A first simple example115
3.5.1.1The direction row-to-column116
3.5.1.2The direction column-to-row117
3.5.1.3Quick discussion118
3.5.2A second simple example119
3.5.2.1The direction row/verb→column/construction119
3.5.2.2The direction column/construction→row/verb120
3.5.3A more realistic example120
3.5.3.1The direction row/verb→column/ditransitive124
3.5.3.2The direction column/ditransitive→row/verb127
3.6Application to key words131
3.6.1A first simple example131
3.6.1.1The direction row/word→column/corpus133
3.6.1.2The direction column/corpus→row/word134
3.6.2A more realistic example135
3.6.2.1The direction row/word→column/corpus137
3.6.2.2The direction column/corpus→row/word140
3.7A brief excursus on contributions to KLD142
3.7.1Contributions to KLD for keyness142
3.7.2Contributions to KLD for association146
3.8Application to concordancing149
3.8.1A first simple example151
3.8.2A more realistic example153
3.8.3Implications for frequency158
3.9Interesting extensions160
3.9.1Psycholinguistic applications (incl. prototypicality)160
3.9.2Synchronic and diachronic corpus homogeneity/comparisons164
3.10Interim summary168
Chapter 4The role, and the ‘partialing out’, of frequency170
4.1Dispersion and its correlation with frequency171
4.2Association and its correlation with frequency177
4.3Keyness and its correlation with frequency193
4.4Partialing out frequency195
4.4.1Dispersion without frequency196
4.4.2Association without frequency208
4.5Interim summary (and ominous implications?)219
Chapter 5Tupleization229
5.1Frequency and dispersion231
5.2Frequency and association241
5.3Frequency and association and dispersion245
5.4Keyness as frequency, association, and dispersion251
Chapter 6What should be next269
6.1Quantifying uncertainty270
6.1.1An example of bootstrapping270
6.1.2Excursus: On significance273
6.2Scaling things up275
6.2.1Speed: Scaling up with parallelization276
6.2.2Speed: Scaling up with Rcpp279
6.2.2.1Dispersion281
6.2.2.2Association283
6.2.2.3Keyness284
6.2.3Size: Scaling up with base285
6.2.4Size and speed: Scaling up with data.table293
6.3The dimensions to tupleize297
6.3.1Dimensions of information: Type frequencies & distributions297
6.3.2What are our tokens?300
Chapter 7Conclusion304
References308
Index
