N-gram probability effects in a cloze task

Shaoul, Cyrus; Harald Baayen, R.; Westbury, Chris F.

doi:10.1075/ml.9.3.04sha

Article published In: Bridging the Methodological Divide: Linguistic and psycholinguistic approaches to formulaic language
Edited by Stefanie Wulff and Debra Titone
[The Mental Lexicon 9:3] 2014
► pp. 437–472

Get fulltext from our e-platform

Download PDF

N-gram probability effects in a cloze task

Cyrus Shaoul | University of Tübingen

R. Harald Baayen | University of Tübingen / University of Alberta

Chris F. Westbury | University of Alberta

Published online: 23 January 2015

https://doi.org/10.1075/ml.9.3.04sha

What knowledge influences our choice of words when we write or speak? Predicting which word a person will produce next is not easy, even when the linguistic context is known. One task that has been used to assess context dependent word choice is the fill-in-the-blank task, also called the cloze task. The cloze probability of specific context is an empirical measure found by asking many people to fill in the blank. In this paper we harness the power of large corpora to look at the influence of corpus-derived probabilistic information from a word’s micro-context on word choice. We asked young adults to complete short phrases called n-grams with up to 20 responses per phrase. The probability of the responded word and the conditional probability of the response given the context were predictive of the frequency with which each response was produced. Furthermore the order in which the participants generated multiple completions of the same context was predicted by the conditional probability as well. These results suggest that word choice in cloze tasks taps into implicit knowledge of a person’s past experience with that word in various contexts. Furthermore, the importance of n-gram conditional probabilities in our analysis is further evidence of implicit knowledge about multi-word sequences and support theories of language processing that involve anticipating or predicting based on context.

Keywords: formulaic language, multi-word expressions, n-grams, cloze probability, production

References (61)

Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716–723.

Arnon, I., & Cohen Priva, U. (2013). More than words: The effect of multi-word frequency and constituency on phonetic duration. Language and Speech, 56(3), 349–371.

Arnon, I., & Snider, N. (2010). More than words: frequency effects for multi-word phrases. Journal of Memory and Language, 62(1), 67–82.

Baayen, R.H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge, UK: Cambridge University Press.

. (2010). Demythologizing the word frequency effect: A discriminative learning perspective. The Mental Lexicon, 5(3), 436–461.

Baayen, R.H., Hendrix, P., & Ramscar, M. (2013). Sidestepping the combinatorial explosion: An explanation of n-gram frequency effects based on naive discriminative learning. Language and Speech, 56(3), 329–347.

Baayen, R.H., Milin, P., Djurdjevic, D., Hendrix, P., & Marelli, M. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118(3), 438–481.

Bar, M. (2007). The proactive brain: Using analogies and associations to generate predictions. Trends in Cognitive Sciences, 11(7), 280–289.

Bates, D., Mächler, M., & Bolker, B. (2011). lme4: linear mixed-effects models using S4 classes. Retrieved from [URL].

Battig, W., & Montague, W. (1969). Category norms of verbal items in 56 categories: A replication and extension of the Connecticut category norms. Journal of Experimental Psychology, 801, 1–46.

Beattie, G., & Butterworth, B. (1979). Contextual probability and word frequency as determinants of pauses and errors in spontaneous speech. Language and Speech, 22(3), 201.

Belsley, D.A., Kuh, E., & Welsch, R.E. (2004). Regression diagnostics: Identifying influential data and sources of collinearity. Hoboken, NJ, USA: Wiley-Interscience.

Block, C., & Baldwin, C. (2010). Cloze probability and completion norms for 498 sentences: Behavioral and neural validation using event-related potentials. Behavior research methods, 42(3), 665–670.

Bloom, P., & Fischler, I. (1980). Completion norms for 329 sentence contexts. Memory and Cognition, 8(6), 631–642.

Bormuth, J. (1966). Readability: A new approach. Reading Research Quarterly, 11, 79–132.

Brants, T., & Franz, A. (2006). Web 1T 5-gram version 1. Philadelphia, PA USA: Linguistic Data Consortium.

Chambers, J.M. (1992). Linear models. In J.M. Chambers & T.J. Hastie (Eds.), Statistical models in S (Chap. 4). USA, NY: Wadsworth & Brooks.

Chou, Y.M., Polansky, A.M., & Mason, R.L. (1998). Transforming non-normal data to normality in statistical process control. Journal of Quality Technology, 30(2), 133–141.

Conway, C.M., Bauernschmidt, A., Huang, S., & Pisoni, D. (2010). Implicit statistical learning in language processing: word predictability is the key. Cognition, 114(3), 356–371.

Criss, A., Aue, W., & Smith, L. (2010). The effects of word frequency and context variability in cued recall. Journal of Memory and Language, 64(2), 119–132.

Crowe, S. (1998). Decrease in performance on the verbal fluency test as a function of time: Evaluation in a young healthy sample. Journal of Clinical and Experimental Neuropsychology, 20(3), 391–401.

DeLong, K., Urbach, T., & Kutas, M. (2005). Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nature Neuroscience, 8(8), 1117.

Dilkina, K., McClelland, J.L., & Plaut, D.C. (2010). Are there mental lexicons? The role of semantics in lexical decision. Brain Research, 13651, 66–81.

Ellis, W. (1999). A source book of Gestalt psychology. London, UK: Psychology Press.

Elman, J. (2011). Lexical knowledge without a lexicon? The Mental Lexicon, 6(1), 1–33.

Fano, R.M., & Hawkins, D. (1961). Transmission of information: A statistical theory of communications. American Journal of Physics, 291, 793.

Fillenbaum, S., Jones, L., & Rapoport, A. (1963). The predictability of words and their grammatical classes as a function of rate of deletion from a speech transcript1. Journal of Verbal Learning and Verbal Behavior, 2(2), 186–194.

Finn, P. (1977). Word frequency, information theory, and cloze performance: A transfer feature theory of processing in reading. Reading Research Quarterly, 13(4), 508–537.

Francis, W., & Kucera, H. (1982). Frequency analysis of English usage. Boston, MA, USA: Houghton Mifflin Company.

Frank, S.L., & Bod, R. (2011). Insensitivity of the human sentence-processing system to hierarchical structure. Psychological Science, 22(6), 829–834.

Griffin, Z., & Bock, K. (1998). Constraint, word frequency, and the relationship between lexical processing levels in spoken word production. Journal of Memory and Language, 38(3), 313–338.

Hahn, L.W., & Sivley, R.M. (2011). Entropy, semantic relatedness and proximity. Behavior Research Methods, 43(3), 746–760.

Hay, J., Pelucchi, B., Estes, K., & Saffran, J. (2011). Linking sounds to meanings: Infant statistical learning in a natural language. Cognitive Psychology, 63(2), 93–106.

Kamide, Y. (2008). Anticipatory processes in sentence processing. Language and Linguistics Compass, 2(4), 647.

Kučera, H., & Francis, W. (1967). Computational analysis of present-day American English. Dartmouth, NH, USA: Dartmouth Publishing Group.

Kutas, M., & Hillyard, S. (1984). Brain potentials during reading reflect word expectancy and semantic association. Nature, 307(5947), 161–163.

McEvoy, C.L., Nelson, D.L., & Komatsu, T. (1999). What is the connection between true and false memories? The differential roles of inter item associations in recall and recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25(5), 1177.

McKenna, M.C. (1986). Cloze procedure as a memory-search process. Journal of Educational Psychology, 781, 433–440.

Mirman, D., Graf Estes, K., & Magnuson, J. (2010). Computational modeling of statistical learning: Effects of transitional probability versus frequency and links to word learning. Infancy, 15(5), 471–486.

Nelson, D.L., McEvoy, C.L., & Dennis, S. (2000). What is free association and what does it measure? Memory & Cognition, 28(6), 887–899.

Nelson, D.L., McEvoy, C.L., & Schreiber, T.A. (1998). The University of South Florida word association, rhyme, and word fragment norms. [URL].

Nelson, D.L., McKinney, V., Gee, N., & Janczura, G. (1998). Interpreting the influence of implicitly activated memories on recall and recognition. Psychological Review, 105(2), 299.

Norris, D., & Kinoshita, S. (2008). Perception as evidence accumulation and Bayesian inference: Insights from masked priming. Journal of Experimental Psychology: General, 137(3), 434–455.

Owens, M., O’Boyle, P., McMahon, J., Ming, J., & Smith, F. (1997). A comparison of human and statistical language model performance using missing-word tests. Language and Speech, 40(4), 377.

Pickering, M., & Garrod, S. (2007). Do people use language production to make predictions during comprehension? Trends in Cognitive Sciences, 11(3), 105–110.

R Development Core Team. (2009). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.

Ramscar, M., & Gitcho, N. (2007). Developmental change and the nature of learning in childhood. Trends in Cognitive Science, 11(7), 274–279.

Ruff, R., Light, R., Parker, S., & Levin, H. (1997). The psychological construct of word fluency. Brain and Language, 57(3), 394–405.

Saffran, J.R., Aslin, R.N., & Newport, E.L. (1996). Statistical learning by 8-month-old infants. Science, 2741, 1926–1928.

Schwanenflugel, P., & LaCount, K. (1988). Semantic relatedness and the scope of facilitation for upcoming words in sentences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 14(2), 344.

Shannon, C.E. (1948). A mathematical theory of communication. Bell System Technical Journal, 271, 379–423.

. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30(1), 50–64.

Shaoul, C., & Westbury, C.F. (2011). Formulaic sequences: Do they exist and do they matter? The Mental Lexicon, 6(1), 171–196.

Shaoul, C., Westbury, C.F., & Baayen, R.H. (2013). The subjective frequency of word n-grams. Psihologija, 46(4), 497–537.

Smith, N.J. (2011). Scaling up psycholinguistics. Unpublished Doctoral Dissertation Downloaded in December, 2013 from [URL]. San Diego, CA, USA: University of California, San Diego.

Smith, N.J., & Levy, R. (2011). Cloze but no cigar: The complex relationship between cloze, corpus, and subjective probabilities in language processing. In Proceedings of the 33rd annual meeting of the cognitive science conference (pp. 1637–1642).

Sprenger, S., & van Rijn, H. (2013). It’s time to do the math: Computation and retrieval in phrase production. The Mental Lexicon, 8(1), 1–25.

Taylor, W. (1953). “Cloze procedure”: A new tool for measuring readability. Journalism Quarterly, 30(4), 415–433.

Tremblay, A., & Tucker, B.V. (2011). The effects of N-gram probabilistic measures on the recognition and production of four-word sequences. The Mental Lexicon, 6(2), 302–324.

Willems, R., & Hagoort, P. (2007). Neural evidence for the interplay between language, gesture, and action: A review. Brain and Language, 101(3), 278–289.

Wood, S. (2006). Generalized additive models: An introduction with R. USA, NY: CRC Press.

Cited by (12)

Cited by 12 other publications

Order by:

Agmon, Galit, Manuela Jaeger, Ella Magen, Danna Pinto, Yuval Perelmuter, Elana Zion Golumbic & Martin G. Bleichner

2025. Challenges and Methods in Annotating Natural Speech for Neurolinguistic Research. Neurobiology of Language 6

Hofmann, Markus J., Steffen Remus, Chris Biemann, Ralph Radach & Lars Kuchinke

2022. Language Models Explain Word Reading Times Better Than Empirical Predictability. Frontiers in Artificial Intelligence 4

Wang, Guanfang, Xianshan Chen, Geng Tian, Jiasheng Yang & Huiling Chen

2022. A Novel N -Gram-Based Image Classification Model and Its Applications in Diagnosing Thyroid Nodule and Retinal OCT Images. Computational and Mathematical Methods in Medicine 2022 ► pp. 1 ff.

Jacobs, Cassandra L.

2021. Quantifying Context With and Without Statistical Language Models. In Handbook of Cognitive Mathematics, ► pp. 1 ff.

Jacobs, Cassandra L.

2022. Quantifying Context With and Without Statistical Language Models. In Handbook of Cognitive Mathematics, ► pp. 1053 ff.

Hollis, Geoff

2019. Learning about things that never happened: A critique and refinement of the Rescorla-Wagner update rule when many outcomes are possible. Memory & Cognition 47:7 ► pp. 1415 ff.

Manshu, Tu & Zhao Xuemin

2019. CCHAN: An End to End Model for Cross Domain Sentiment Classification. IEEE Access 7 ► pp. 50232 ff.

Lõo, Kaidi, Juhani Järvikivi, Fabian Tomaschek, Benjamin V. Tucker & R. Harald Baayen

2018. Production of Estonian case-inflected nouns shows whole-word frequency and paradigmatic effects. Morphology 28:1 ► pp. 71 ff.

Bejar, Isaac I., Paul D. Deane, Michael Flor & Jing Chen

2017. Evidence of the Generalization and Construct Representation Inferences for the GRE® revised General Test Sentence Equivalence Item Type. ETS Research Report Series 2017:1 ► pp. 1 ff.

Baayen, R. Harald, Petar Milin & Michael Ramscar

2016. Frequency in lexical processing. Aphasiology 30:11 ► pp. 1174 ff.

Jacobs, Cassandra L., Gary S. Dell, Aaron S. Benjamin & Colin Bannard

2016. Part and whole linguistic experience affect recognition memory for multiword sequences. Journal of Memory and Language 87 ► pp. 38 ff.

Matusevych, Yevgen, Afra Alishahi & Ad Backus

2016. Modelling verb selection within argument structure constructions. Language, Cognition and Neuroscience 31:10 ► pp. 1215 ff.

This list is based on CrossRef data as of 27 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.