In:Frequency, Dispersion, Association, and Keyness: Revising and tupleizing corpus-linguistic measures
Stefan Th. Gries
[Studies in Corpus Linguistics 115] 2024
► pp. 308–318
References
Published online: 4 July 2024
https://doi.org/10.1075/scl.115.refs
https://doi.org/10.1075/scl.115.refs
Ackermann, Kirsten & Yu-Hua Chen. 2013. Developing
the Academic Collocation List: A corpus-driven and expert-judged approach. Journal of English for
Academic
Purposes 12(4). 235–247.
Adelman, James S., Gordon D. A. Brown, & José F. Quesada. 2006. Contextual
Diversity, not word frequency, determines word-naming and lexical decision times. Psychological
Science 19(9). 814–823.
Adèr, Herman J. 2008. Modelling. In Herman J. Adèr & Gideon J. Mellenbergh (eds.), Advising
on research methods: A consultant’s
companion, 271–304. Huizen: Johannes van Kessel Publishing.
Ambridge, Ben, Anna L. Theakston, Elena V. M. Lieven, & Michael Tomasello. 2006. The
distributed learning effect for children’s acquisition of an abstract syntactic
construction. Cognitive
Development 21(2). 174–193.
Archer, Dawn (ed.). 2009. What’s
in a word-list? Investigating word frequency and keyword
extraction. London: Routledge.
Arppe, Antti. 2008. Univariate,
bivariate and multivariate methods in corpus-based lexicography – A study of synonymy. Ph.D.
dissertation, University of Alberta.
Aslin, Richard N. & Elissa L. Newport. 2012. Statistical
learning: From acquiring specific items to forming general rules. Current Directions in Psychological
Science 21(3). 170–176.
Baayen, R. Harald. 2010. Demythologizing the word
frequency effect: A discriminative learning perspective. The Mental
Lexicon 5(3). 436–461.
. 2011. Corpus linguistics and naive
discriminative learning. Brazilian Journal of Applied
Linguistics 11(2). 295–328.
Baayen, R. Harald, Petar Milin, Dusica Filipović-Đurđević, D., Peter Hendrix, & Marco Marelli. 2011. An
amorphous model for morphological processing in visual comprehension based on naive discriminative
learning. Psychological
Review 118(3). 438–481.
Babych, Bogdan & Anthony Hartley. 2011. Meta-evaluation
of comparability metric using parallel corpora. International Journal of Computational Linguistics
and
Applications 2(1–2). 209–222.
Baguley, Thom. 2012. Serious
stats: A guide to advanced statistics for the behavioral
sciences. Houndmills: Palgrave Macmillan.
Baker, Paul. 2004. Querying
keywords: Questions in difference, frequency, and sense in keyword analysis. Journal of English
Linguistics 32(4). 346–359.
Baker, Paul & Tony McEnery. 2005. A
corpus-based approach to discourses of refugees and asylum seekers in UN and newspaper texts. Journal
of Language and
Politics 4(2). 197–226.
Baron, Alistair, Paul Rayson, & Dawn Archer. 2009. Word
frequency and keyword statistics in historical corpus linguistics. Anglistik: International Journal
of English
Studies 20(1). 41–67.
Bavaud, François. 2009. Information
theory, relative entropy and statistics. In G. Sommaruga (ed.), Formal
theories of information: Lecture notes in computer
science, 54–78. Berlin: Springer.
Belov, Dmitry I. & Ronald D. Armstrong. 2011. Distributions
of the Kullback-Leibler divergence with applications. British Journal of Mathematical and Statistical
Psychology 64(2). 291–309.
Berger, Cynthia, Scott Crossley, & Stephen Skalicky. 2019. Using
lexical features to investigate second language lexical decision performance. Studies in Second
Language
Acquisition 41(5). 911–935.
Berry-Rogghe, Godelieve L. M. 1974. Automatic identification of phrasal
verbs. In John L. Mitchell (ed.), Computers
in the
humanities, 16–26. Edinburgh: Edinburgh University Press.
Bestgen, Yves & Sylviane Granger. 2014. Quantifying
the development of phraseological competence in L2 English writing: An automated approach. Journal of
Second Language Writing 26. 28–41.
Biber, Douglas, Randi Reppen, Erin Schnur, & Romy Ghanem. 2016. On
the (non)utility of Juilland’s D to measure lexical dispersion in large
corpora. International Journal of Corpus
Linguistics 21(4). 439–464.
Bondi, Marina & Mike Scott (eds.). 2010. Keyness
in texts. Amsterdam: John Benjamins.
Bortz, Jürgen, Gustav A. Lienert, & Klaus Boehnke. 2008. Verteilungsfreie
Methoden in der Biostatistik. 3rd corr.
ed. Heidelberg: Springer Medizin Verlag.
Bouma, Gerlof. 2009. Normalized
(Pointwise) Mutual Information in collocation extraction. Proceedings of the Biennial GSCL
Conference 30. 31–40.
Bresnan, Joan, Anna Cueni, Tatiana Nikitina, & R. Harald Baayen. 2007. Predicting
the dative alternation. In Gerlof Bouma, Irene Kraemer, & Joost Zwarts (eds.), Cognitive
foundations of
interpretation, 69–94. Amsterdam: Royal Netherlands Academy of Arts and Sciences.
Brezina, Vaclav, & Miriam Meyerhoff. 2014. Significant
or random? A critical review of sociolinguistic generalisations based on large corpora. International
Journal of Corpus
Linguistics 19(1). 1–28.
Brysbaert, Marc & Boris New. 2009. Moving
beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word
frequency measure for American English. Behavior Research
Methods 41(4). 977–990.
Burch, Brent, Jesse Egbert, & Douglas Biber. 2017. Measuring
and interpreting lexical dispersion in corpus linguistics. Journal of Research Design and Statistics
in Linguistics and Communication
Science 3(2). 189–216.
Burnham, Kenneth P. & David R. Anderson. 2001. Kullback-Leibler
information as a basis for strong inference in ecological studies. Wildlife
Research 28(2). 111–119.
Bybee, Joan & Sandra A. Thompson. 1997. Three
frequency effects in syntax. Berkeley Linguistics
Society 23. 65–85.
Carroll, John B. 1970. An alternative to Juilland’s usage
coefficient for lexical frequencies and a proposal for a standard frequency index. Computer Studies
in the Humanities and Verbal
Behaviour 3(2). 61–65.
Charles, Walter G., & George A. Miller. 1989. Contexts
of antonymous adjectives. Applied
Psycholinguistics 10(3). 357–375.
Chen, Stanley F. & Joshua Goodman. 1999. An
empirical study of smoothing techniques for language modeling. Computer Speech and
Language 13(4). 359–394.
Church, Kenneth W. 2000. Empirical estimates of adaptation: The
chance of two Noriegas is closer to p/2 than
p2. In Proceedings of the COLING
2000 (The 18th international conference on computational
linguistics). np.
Church, Kenneth W. William Gale, Patrick Hanks, & Douglas Hindle. 1991. Using
statistics in lexical analysis. In Uri Zernik (ed.), Lexical
acquisition: Exploiting on-line resources to build a
lexicon, 115–164. Hillsdale, NJ: Lawrence Erlbaum Associates.
Church, Kenneth W. & Patrick Hanks. 1993. Word
association norms, mutual information, and lexicography. Computational
Linguistics 16(1). 22–29.
Collins, Peter. 2021. Cultural
keywords in World Englishes: A GloWbE-based study. ICAME
Journal 45. 5–35.
Cover, Thomas H. & Joy A. Thomas. 2006. Elements
of information theory. 2nd ed. Hoboken, NJ: John Wiley.
Culpeper, Jonathan. 2002. Computers,
language and characterisation: An analysis of six characters in Romeo and
Juliet. In Ulla Melander-Marttala, Carin Östman, & Merja Kyto (eds.), Conversation
in life and in
literature, 11–30. Uppsala: Association Suédoise de Linguistique Appliquée.
. 2009. Words,
parts-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and
Juliet. International Journal of Corpus
Linguistics 14(1). 29–59.
Cvrček, Václav & Masako Fidler. 2019. More
than keywords: Discourse prominence analysis of the Russian web portal Sputnik Czech
Republic. In M. Berrocal & A. Salamurović (eds.), Political
discourse in Central, Eastern and Balkan
Europe, 93–117. Amsterdam John Benjamins.
Damerau, Frederick J. 1990. Evaluating computer-generated
domain-oriented vocabularies. Information Processing and
Management 26(6). 791–801.
1993. Generating and evaluating
domain-oriented multi-word terms from texts. Information Processing and
Management 29(4). 433–447.
Daudaravičius, Vidas & Ruta Marcinkevičienė. 2004. Gravity
counts for the boundaries of collocations. International Journal of Corpus
Linguistics 9(2). 321–348.
Degaetano-Ortlieb, Stefania & Elke Teich. 2016. Information-based
modeling of diachronic linguistic change: From typicality to
productivity. In Proceedings of the 10th SIGHUM Workshop on Language
Technology for Cultural Heritage, Social Sciences, and Humanities
(LaTeCH), 165–173. Berlin.
. 2022. Toward
an optimal code for communication: The case of scientific English. Corpus Linguistics and Linguistic
Theory 18(1). 175–207.
Davies, Mark & Dee Gardner. 2010. A
frequency dictionary of contemporary American English: Word sketches, collocates and thematic
lists. London: Routledge.
Do, Youngah & Ryan Ka Yau Lai. 2019. Large-sample
confidence intervals of information-theoretic measures in linguistics. Journal of Research Design and
Statistics in Linguistics and Communication
Science 6(1). 19–54
Dunning, Ted. 1993. Accurate
methods for the statistics of surprise and coincidence. Computational
Linguistics 19(1). 61–74.
Durlak, Joseph A. 2009. How to select, calculate, and
interpret effect sizes. Journal of Pediatric
Psychology 34(9). 917–928.
Durrant, Phil & Norbert Schmitt. 2009. To
what extent do native and non-native writers make use of collocations? International Review of
Applied Linguistics 47. 157–177.
Edmundson, Harold P. & W. Wyllys. 1961. Automatic
abstracting and indexing – Survey and recommendations. Communications of the
ACM 4. 226–234.
Egbert, Jesse & Douglas Biber. 2019. Incorporating
text dispersion into keyword
analyses. Corpora 14(1). 77–104.
Ellis, Nick C. 2006. Language acquisition as rational
contingency learning. Applied
Linguistics 27(1). 1–24.
Ellis, Nick C., Ute Römer, & Matthew Brook O’Donnell. 2016. Usage-based
approaches to language acquisition and processing. New York, NY: Wiley-Blackwell.
Ellis, Nick C. & Rita Simpson-Vlach. 2005. An
academic formulas list (AFL): Extraction, validation, prioritization. Paper presented
at Phraseology 2005, Université Catholique
Louvain-la-Neuve.
Ellis, Nick. C., Rita Simpson-Vlach, & Carson Maynard. 2007. The
processing of formulas in native and L2 speakers: psycholinguistic and corpus determinants. Paper
presented at the Symposium on Formulaic Language, University of
Wisconsin-Milwaukee.
Eskridge, William N., Brian G. Slocum, & Stefan Th. Gries. 2021. The
meaning of sex: Dynamic words, novel applications, and original public
meaning. Michigan Law
Review 119(7). 1503–1580.
Evert, Stefan. 2009. Corpora
and collocations. In Anke Lüdeling & Merja Kytö (eds.), Corpus
linguistics: An international
handbook, Vol. 2, 1212–1248. Berlin: Mouton de Gruyter.
Evert, Stefan & Brigitte Krenn. 2001. Methods
for the qualitative evaluation of lexical association
measures. In Proceedings of the 39th Annual Meeting of the Association for
Computational
Linguistics, 188–195. Toulouse.
Fankhauser, Peter, Jörg Knappen, & Elke Teich. 2014. Exploring
and visualizing variation in language resources. In Proceedings of the 9th
International Conference on Language Resources and Evaluation
(LREC’14), 4125–4128.
Fidler, Masako & Václav Cvrček. 2015. A
data-driven analysis of reader viewpoints: reconstructing the historical reader using keyword
analysis. Journal of Slavic
Linguistics 23(2). 197–239.
Francis, W. Nelson & Henry Kučera. 1982. Frequency
analysis of English usage: Lexicon and grammar. Boston, MA: Houghton Mifflin.
Forster, Kenneth I. & Susan M. Chambers. 1973. Lexical
access and naming time. Journal of Verbal Learning and Verbal
Behavior 12(6). 627–635.
Fidelholtz, James L. 1975. Word frequency and vowel reduction in
English. Chicago Linguistic
Society 11. 200–213.
Gabrielatos, Costas. 2018. Keyness
analysis: Nature, metrics and techniques. In Charlotte Taylor & Anna Marchi (eds.), Corpus
approaches to discourse: A critical
review, 225–258. London: Routledge.
Gale, William A. & Geoffrey Sampson. 1995. Good-Turing
frequency estimation without tears. Journal of Quantitative
Linguistics 2(3). 217–237.
Gardner, Dee & Mark Davies. 2014. A
new Academic Vocabulary List. Applied
Linguistics 35(3). 305–327.
Glenberg, Arthur M. 1976. Monotonic and nonmonotonic lag effects
in paired-associate and recognition memory paradigms. Journal of Verbal Learning and Verbal
Behavior 15(1). 1–15.
1979. Component-levels theory of the effects
of spacing of repetitions on recall and recognition. Memory and
Cognition 7(2). 95–112.
Goldberg, Adele E. 1995. Constructions: A Construction Grammar
approach to argument structure. Chicago, IL: The University of Chicago Press.
Goldberg, Adele E., Devin M. Casenhiser, & Nitya Sethuraman. 2004. Learning
argument structure generalizations. Cognitive
Linguistics 15(3). 289–316.
Gómez, Rebecca L. 2002. Variability and detection of invariant
structure. Psychological
Science 13(5). 431–436.
Groom, Nicholas. 2009. Effects
of second language immersion on second language collocational
development. In Andy Barfield & Henrik Gyllstad (eds.), Researching
collocations in another
language, 21–33. Houndmills: Palgrave Macmillan.
Gries, Stefan Th. 2003. Towards a corpus-based identification
of prototypical instances of constructions. Annual Review of Cognitive
Linguistics 1. 1–27.
Null-hypothesis significance testing of word frequencies: A
follow-up on Kilgarriff. Corpus Linguistics and Linguistic
Theory 1(2). 277–294.
2006. Exploring variability within and
between corpora: Some methodological
considerations. Corpora 1(2). 109–151.
2008. Dispersions and adjusted frequencies
in corpora. International Journal of Corpus
Linguistics 13(4). 403–437.
2010. Dispersions and adjusted frequencies
in corpora: Further explorations. In Stefan Th. Gries, Stefanie Wulff, & Mark Davies (eds.), Corpus
linguistic applications: Current studies, new
directions, 197–212. Amsterdam: Rodopi.
2013. 50-something years of work on
collocations: What is or should be next … International Journal of Corpus
Linguistics 18(1). 137–165.
2016. Quantitative corpus linguistics with
R. 2nd rev. & ext. ed. New York &
London: Routledge, pp. 274.
2018. The discriminatory power of lexical
context for alternations: An information-theoretic exploration. Journal of Research Design and
Statistics in Linguistics and Communication
Science 5(1–2). 78–106.
2019a. 15 years of collostructions: Some
long overdue additions/corrections (to/of actually all sorts of corpus-linguistics
measures). International Journal of Corpus
Linguistics 24(3). 385–412.
2019b. Ten lectures on corpus-linguistic
approaches: Applications for usage-based and psycholinguistic
research. Leiden: Brill.
2020. Analyzing
dispersion. In Magali Paquot & Stefan Th. Gries (eds.), A
practical handbook of corpus
linguistics, 99–118. Berlin: Springer.
2021b. A new approach to (key) keywords
analysis: Using frequency, and now also dispersion. Research in Corpus
Linguistics 9(2). 1–33.
2022a. What do (some of) our association
measures measure (most)? Association? Journal of Second Language
Studies 5(1). 1–33.
2022b. What do (most of) our dispersion
measures measure (most)? Dispersion? Journal of Second Language
Studies 5(2). 171–205.
2022c. Towards more careful corpus
statistics: Uncertainty estimates for frequencies, dispersions, association measures, and
more. Research Methods in Applied
Linguistics 1(1).
2022d. Multi-word units (and tokenization
more generally): A multi-dimensional and largely information-theoretic
approach. Lexis 19.
2024. Corrections to Nelson (2023):
DPnorm and DKLnorm are not wrong on pi at
all. Journal of Quantitative Linguistics.
To appear. Cultural keywords in
varieties research: Some suggestions to extend existing work. World
Englishes.
Gries, Stefan Th., Beate Hampe, & Doris Schönefeld. 2005. Converging
evidence: Bringing together experimental and corpus data on the association of verbs and
constructions. Cognitive
Linguistics 16(4). 635–676.
Gries, Stefan Th. & Joybrato Mukherjee. 2010. Lexical
gravity across varieties of English: An ICE-based study of n-grams in Asian Englishes. International
Journal of Corpus
Linguistics 15(4). 520–548.
Groom, Nicholas. 2009. Effects
of second language immersion on second language collocational
development. In Andy Barfield & Henrik Gyllstad (eds.), Researching
collocations in another
language, 21–33. Houndmills: Palgrave Macmillan.
Hackstein, Olav & Ryan Sandell. 2023. The
rise of colligations: English can’t stand and German nicht ausstehen
können. International Journal of Corpus
Linguistics 28(1). 60–90.
Hilpert, Martin & Stefan Th. Gries. 2009. Assessing
frequency changes in multi-stage diachronic corpora: Applications for historical corpus linguistics and the study of language
acquisition. Literary and Linguistic
Computing 34(4). 385–401.
Hoffman, Elaine B., Pranab K. Sen, Clarice R. Weinberg. 2001. Within-cluster
resampling. Biometrika 88(4). 1121–1134.
Howes, Davis H. & Richard L. Solomon. 1951. Visual
duration threshold as a function of word probability. Journal of Experimental
Psychology 41(6). 401–410.
James, Gareth, Daniela Witten, Trevor Hastie, & Robert Tibshirani. 2021. An
introduction to statistical learning with applications in R. 2nd
ed. Berlin: Springer.
Juilland, Alphonse G., Dorothy R. Brodin, & Catherine Davidovitch. 1970. Frequency
dictionary of French words. The Hague: Mouton de Gruyter.
Juilland, Alphonse & E. Chang-Rodriguez. 1964. Frequency
dictionary of Spanish words. The Hague: Mouton de Gruyter.
Justeson, John S. & Slava M. Katz. 1991. Co-occurrences
of antonymous adjectives and their contexts. Computational
Linguistics 17(1). 1–20.
. 1986. Frequency
considerations in morphology. Zeitschrift für Phonetik, Sprachwissenschaft und
Kommunikationsforschung 39(1). 19–28.
Koplenig, Alexander. 2017. A
data-driven method to identify (correlated) changes in chronological corpora. Journal of Quantitative
Linguistics 24(4). 289–318.
Kullback, Solomon & Richard A. Leibler. 1951. On
information and sufficiency. Annals of Mathematical
Statistics 22(1). 79–86.
Kuperman, Victor, Hans Stadthagen-Gonzalez, & Marc Brysbaert. 2012. Age-of-acquisition
ratings for 30,000 English words. Behavior Research
Methods 44. 978–990.
Kyle, Kristopher & Scott A. Crossley. 2015. Automatically
assessing lexical sophistication: Indices, tools, findings, and application. TESOL
Quarterly 49(4). 757–786.
Lachman, Roy. 1973. Uncertainty
effects on time to access the internal lexicon. Journal of Experimental
Psychology 99(2). 199–208.
Langacker, Ronald W. 1987. Foundations of Cognitive Grammar I:
Theoretical prerequisites. Stanford, CA: Stanford University Press.
Langenhorst, Jan, Yannick Frommherz, & Simon Meier-Vieracker. 2023. Keyness
in song lyrics: Challenges of highly clumpy data. Journal for Language Technology and Computational
Linguistics 36(1). 21–38.
Leech, Geoffrey, Paul Rayson, & Andrew Wilson. 2001. Word
frequencies in written and spoken English: Based on the British National
Corpus. London: Longman.
Leech, Geoffrey & Roger Fallon. 1992. Computer
corpora – What do they tell us about culture? ICAME
Journal 16. 29–50.
Lester, Nicholas A. 2017. The syntactic bits of nouns: How prior
syntactic distributions affect comprehension, production, and acquisition. Ph.D.
dissertation, University of California, Santa Barbara.
Lester, Nicholas A., Daniel Baum, & Tirza Biron. 2018. Phonetic
duration of nouns depends on de-lexicalized syntactic distributions: Evidence from naturally occurring
conversation. In Chuck Kalish, Martina Rau, Jerry Zhu, & Timothy Rogers (eds.), Proceedings
of the 40th Annual Conference of the Cognitive Science
Society, 2035–2040. Madison, WI.
Lester, Nicholas A., Laurie B. Feldman, & Fermín Moscoso del Prado Martín. 2017. You
can take a noun out of syntax…: Syntactic similarity effects in lexical
priming. In Glenn Gunzelmann, Andrew Howes, Thora Tenbrink, & Eddy Davelaar (eds.), Proceedings
of the 39th Annual Conference of the Cognitive Science
Society, 2537–2542. London, UK.
Lester, Nicholas A. & Fermín Moscoso del Prado Martín. 2017. Syntactic
flexibility in the noun: evidence from picture naming. In Anna Papafragou, Daniel Grodner, Daniel Mirman, & John C. Trueswell (eds.), Proceedings
of the 38th Annual Conference of the Cognitive Science
Society, 2585–2590. Philadelphia, PA.
Lijffijt, Jefrey & Stefan Th. Gries. 2012. Correction
to “Dispersions and adjusted frequencies in corpora”. International Journal of Corpus
Linguistics 17(1). 147–149.
Lim, Zheng Wei, Harry Stuart, Simon De Deyne, Terry Regier, Ekaterina Vylomova, Trevor Cohn, & Charles Kemp. 2022. A
computational approach to discovering cultural keywords across
languages. PsyArXiv, last edited 22 Nov 2022.
Linzen, Tal & T. Florian Jaeger. 2015. Uncertainty
and expectation in sentence processing: Evidence from subcategorization distributions. Cognitive
Science 40(6). 1382–1411.
Linzen, Tal, Alec Marantz, & Liina Pylkkänen. 2013. Syntactic
context in visual world recognition: An MEG study. The Mental
Lexicon 8(2). 117–139.
Mahlberg, Michaela. 2008. Clusters,
key clusters and local textual functions in
Dickens. Corpora 2(1). 1–31.
McConnell, Kyla & Alice Blumenthal-Dramé. 2022. Effects
of task and corpus-derived association scores on the online processing of collocations. Corpus
Linguistics and Linguistic
Theory 18(1). 33–76.
McDonald, Scott A. & Richard C. Shillcock. 2001. Rethinking
the word frequency effect: The neglected role of distributional information in lexical
processing. Language and
Speech 44(3). 295–323.
McEnery, Anthony, Richard Xiao, & Yukio Tono. 2006. Corpus-based
language studies: An advanced resource book. London & New York: Routledge.
Mehl, Seth. 2021. What
we talk about when we talk about corpus frequency: The example of polysemous verbs with light and concrete
senses. Corpus Linguistics and Linguistic
Theory 17(1). 223–247.
Michelbacher, Lukas, Stefan Evert, & Hinrich Schütze. 2011. Asymmetry
in corpus-derived and human word associations. Corpus Linguistics and Linguistic
Theory 7(2). 245–276.
Mildenberger, Thoralf. 2023. Assessing
keyness using permutation
tests. arXiv: 2308.13383v1, last
accessed 25 Aug 2023.
Milin, Petar, Dusica Filipović-Đurđević, D., & Fermín Moscoso del Prado Martín. 2009. The
simultaneous effects of inflectional paradigms and classes on lexical recognition: Evidence from
Serbian. Journal of Memory and
Language 60(1). 50–64.
Milin, Petar, Victor Kuperman, Aleksandar Kostić, & R. Harald Baayen. 2009. Words
and paradigms bit by bit: An information-theoretic approach to the processing of inflection and
derivation. In James P. Blevins & Juliette Blevins (eds.), Analogy
in grammar: Form and
acquisition, 214–252. Oxford: Oxford University Press.
Millar, Neil & Brian S. Budgell. 2008. The
language of public health – A corpus-based analysis. Journal of Public
Health 16(5). 369–374.
Mollin, Sandra. 2009. Combining
corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word
associations. Corpus Linguistics and Linguistic
Theory 5(2). 175–200.
Monroe, Burt L., Michael P. Colaresi, & Kevin M. Quinn. 2008. Fightin’
words: Lexical feature selection and evaluation for identifying the content of political
conflict. Political
Analysis 16(4). 372–403.
Monsell, Stephen. 1991. The
nature and locus of word frequency effects in reading. In Derek Besner & Glyn W. Humphreys (eds.), Basic
processes in reading: Visual word
recognition, 148–197. Hillsdale, NJ: Lawrence Erlbaum Associates.
Moran, Matthew D. 2003. Arguments for rejecting sequential
Bonferroni in ecological
studies. OIKOS 100. 403–405.
Morrison, Catriona M., Andrew W. Ellis, & Philip T. Quinlan. 1992. Age
of acquisition, not word frequency, affects object naming, not object recognition. Memory and
Cognition 20. 705–714.
Mukherjee, Joybrato & Tobias Bernaisch. 2015. Cultural
keywords in context: A pilot study of linguistic acculturation in South Asian
Englishes. In Peter Collins (ed.), Grammatical
change in English
world-wide, 411–435. Amsterdam: John Benjamins.
Nakagawa, Shinichi. 2004. A
farewell to Bonferroni: The problems of low statistical power and publication bias. Behavioral
Ecology 15(6). 1044–1045.
Nelson, Robert. 2023. Too
noisy at the bottom: Why Gries’ (2008, 2020) dispersion measures cannot identify unbiased distributions of
words. Journal of Quantitative
Linguistics 30(2). 153–166.
Nenadić, Filip, Petar Milin, & Benjamin V. Tucker. 2021. Relative
entropy effects on the processing of spoken Romanian verbs. The Mental
Lexicon 16(1). 23–48.
Oakes, Michael & Malcolm Farrow. 2007. Use
of the chi-squared test to examine vocabulary differences in English language corpora representing seven different
countries. Literary and Linguistic
Computing 22(1). 85–99.
Oldfield, R. & A. Wingfield. 1965. Response
latencies in naming objects. Quarterly Journal of Experimental
Psychology A(17). 273–281.
Onnis, Luca, Padraic Monaghan, Morten H. Christiansen, & Nick Chater. 2004. Variability
is the spice of learning, and a crucial ingredient for detecting and generalizing in nonadjacent
dependencies. In Proceedings of the 26th Annual Meeting of the Cognitive
Science Society, 1678–1683.
Paquot, Magali. 2010. Academic
vocabulary in learner writing: From extraction to analysis. London & New-York, Continuum.
. 2013. Lexical
bundles and transfer effects. International Journal of Corpus
Linguistics 18(3). 391–417.
. 2014. Cross-linguistic
influence and formulaic language: Recurrent word sequences in French learner
writing. In Leah Roberts, Ineke Vedder, & Jan H. Hulstijn (eds.), Eurosla
Yearbook, Vol. 14, 216–237. Amsterdam: John Benjamins.
. 2017. L1
frequency in foreign language acquisition: Recurrent word combinations in French and Spanish EFL learner
writing. Second Language
Research 33(1). 13–32.
Paquot, Magali & Yves Bestgen. 2009. Distinctive
words in academic writing: A comparison of three statistical tests for keyword
extraction. In Andreas Jucker, Daniel Schreier, & Marianne Hundt (eds.), Corpora:
Pragmatics and
discourse, 247–269. Amsterdam: Rodopi.
Paulsen, Mikkel Ekeland. To appear. Assessing word
commonness: Adding dispersion to frequency. International Journal of Corpus
Linguistics.
Pecina, Pavel. 2010. Lexical
association measures and collocation extraction. Language Resources and
Evaluation 44(1–2). 137–158.
Pedersen, Ted. 1996. Fishing
for exactness. In Proceedings of the South-Central SAS Users Group
Conference (SCSUG-96), 27-29.10.1996, Austin, TX.
Pojanapunya, Punjaporn & Richard Watson Todd. 2018. Log-likelihood
and odds ratio: Keyness statistics for different purposes of keyword analysis. Corpus Linguistics and
Linguistic
Theory 14(1). 133–167.
Rayner, Keith & Susan A. Duffy. 1986. Lexical
complexity and fixation times in reading: Effects of word frequency, verb complexity, and lexical
ambiguity. Memory and
Cognition 14(3). 191–201.
Rayson, Paul, Damon Berridge, & Brian J. Francis. 2004. Extending
the Cochran rule for the comparison of word frequencies between
corpora. In Gérald Purnelle, Cédrick Fairon, & Anne Dister (eds.), Le
poids des mots: Proceedings of the 7th International Conference on Statistical analysis of textual
data, Vol. II, 926–936. Louvain-la-Neuve: Presses Universitaires de Louvain.
Rayson, Paul & Amanda Potts. 2020. Analysing
keyword lists. In Magali Paquot & Stefan Th. Gries (eds.), Practical
handbook of corpus
linguistics, 119–139. Berlin: Springer.
Resnik, Philip. 1996. Selectional
constraints: An information-theoretic model and its computational
realization. Cognition 61(1–2). 127–159.
Rogers, Phillip G. & Stefan Th. Gries. 2022. Grammatical
gender disambiguates syntactically similar
nouns. Entropy 24(4), 520.
Römer, Ute & Stefanie Wulff. 2008. Applying
corpus methods to written academic texts: Explorations of MICUSP. Journal of Writing
Research 2(2). 99–127.
Rosengren, Inger. 1971. The
quantitative concept of language and its relation to the structure of frequency dictionaries. Études
de Linguistique Appliquée (Nouvelle
Série) 1. 103–127.
Savický, Petr & Jaroslava Hlaváčová. 2002. Measures
of word commonness. Journal of Quantitative
Linguistics 9(3). 215–231.
Schmid, Hans Joerg. 2010. Entrenchment, salience, and basic
levels. In Dirk Geeraerts & Hubert Cuyckens (eds.), The
Oxford handbook of cognitive
linguistics, 117–138. Oxford: Oxford University Press.
Schooler, Lael J., & John R. Anderson. 1997. The
role of process in the rational analysis of memory. Cognitive
Psychology 32(3). 219–250.
Schneider, Ulrike. 2020. ΔP
as a measure of collocation strength: Considerations based on analyses of hesitation
placement. Corpus Linguistics and Linguistic
Theory 16(2). 249–274.
Scott, Mike & Christopher Tribble. 2006. Textual
patterns: Key words and corpus analysis in language
education. Amsterdam: John Benjamins.
Seidenberg, Mark S. & Mayellen C. MacDonald. 1999. A
probabilistic constraints approach to language acquisition and processing. Cognitive
Science 23(4). 569–588.
Sheskin, David. 2011. Handbook
of parametric and non-parametric statistical procedures. 5th ed. Boca Raton, FL: Taylor & Francis.
Shlens, Jonathon. 2014. Notes
on Kullback-Leibler Divergence and Likelihood Theory. arXiv
preprint, 1404.2000v1, 8 April
2014.
Siyanova-Chanturia, Anna. 2015. Collocation
in beginner learner writing: A longitudinal
study. System 53. 148–160.
Sönning, Lukas. 2024. Evaluation
of keyness metrics: Performance and reliability. Corpus Linguistics and Linguistic
Theory 20(2). 263–288.
Spärck Jones, Karen. 1972. A
statistical interpretation of term specificity and its application in information retrieval. Journal
of
Documentation 28(1). 11–21.
Stefanowitsch, Anatol & Stefan Th. Gries. 2003. Collostructions:
Investigating the interaction between words and constructions. International Journal of Corpus
Linguistics 8(2). 209–243.
Stubbs, Michael. 1995. Collocations
and semantic profiles: On the cause of the trouble with quantitative methods. Functions of
Language 2(1). 23–55.
. 1996. Text
and corpus analysis: Computer-assisted studies of language and
culture. Oxford: Blackwell.
Suethanapornkul, Sakol & Sarut Supasiraprapa. To
appear. Usage events and constructional knowledge: A study of two variants of the
introductory-it construction. Studies in Second Language
Acquisition.
Sun, Hao & Jean-Pierre Koenig. 2017. There
are more valence alternations than the ditransitive. In Julia Nee, Margaret Cychosz, Dmetri Hayes, Tyler Lau, & Emily Remirez (eds.), Proceedings
of the 43rd Meeting of the Berkeley Linguistics
Society, 291–308. Berkeley, CA: Berkeley Linguistics Society.
Tomokiyo, Takashi & Matthew Hurst. 2003. A
language model approach to keyphrase extraction. In Proceedings of the ACL
2003 Workshop on Multiword Expressions: Analysis, Acquisition and
Treatment, 33–40. Stroudsbury, PA.
Tribble, Christopher. 2002. Small
corpora and teaching writing: Towards a corpus-informed pedagogy of
writing. In Mohsen Ghadessy, Alex Henry, & Robert L. Roseberry (eds.), Small
corpus studies and ELT: Theory and
practice, 381–408. Amsterdam: John Benjamins.
Tucker, Benjamin V., Daniel Brennerm, D. Kyle Danielson, Matthew C. Kelley, Filip Nenadić, & Michelle Sims. 2019. The
Massive Auditory Lexical Decision (MALD) database. Behavior Research
Methods 51. 1187–1204.
van Heuven, Walter J. B., Pawel Mandera, Emmanuel Keuleers, & Marc Brysbaert. 2014. SUBTLEX-UK:
A new and improved word frequency database for British English. The Quarterly Journal of Experimental
Psychology 67(6). 1176–1190.
VanPatten, Bill, Jessica Williams, Gregory D. Keating, & Stefanie Wulff. 2020. Introduction:
The nature of theories. In Bill VanPatten, Gregory D. Keating, & Stefanie Wulff (eds.), Theories
in second language acquisition: An
introduction, 1–17. New York, NY: Routledge.
Weisberg, Herbert F. 1974. Models of statistical
relationship. The American Political Science
Review 68(4). 1638–1655.
Wettler, Manfred, Reinhard Rapp, & Peter Sedlmeier. 2005. Free
word associations correspond to contiguities between words in texts. Journal of Quantitative
Linguistics 12(2–3). 111–122.
Wilcox, Allen R. 1973. Indices of qualitative variation and
political measurement. The Western Political
Quarterly 26(2). 325–343.
