Article published In: Pragmatics & Cognition
Vol. 26:2/3 (2019) ► pp.357–385
Now, never, or coming soon?
Prediction and efficient language processing
Published online: 12 February 2021
https://doi.org/10.1075/pc.19001.rap
https://doi.org/10.1075/pc.19001.rap
Abstract
The general principles of perceptuo-motor processing and memory give rise to the Now-or-Never bottleneck constraint
imposed on the organization of the language processing system. In particular, the Now-or-Never bottleneck demands an
appropriate structure of linguistic input and rapid incorporation of both linguistic and multisensory contextual information in a
progressive, integrative manner. I argue that the emerging predictive processing framework is well suited for the task of providing a
comprehensive account of language processing under the Now-or-Never constraint. Moreover, this framework presents a stronger alternative to
the Chunk-and-Pass account proposed by Christiansen and Chater (2016), as it better accommodates the available evidence
concerning the role of context (in both the narrow and wider senses) in language comprehension at various levels of linguistic
representation. Furthermore, the predictive processing approach allows for treating language as a special case of domain-general processing
strategies, suggesting deep parallels with other cognitive processes such as vision.
Article outline
- 1.Introduction
- 2.The Now-or-Never bottleneck
- 2.1Consequences 1 & 3: Multilevel organization of language and incrementality of language processing
- 2.2Consequences 2 & 4: Prediction, locality, and language modularity
- 2.3Not all mistakes are corrected
- 2.4Requirements for an account of language processing: A quick summary
- 3.Predictive processing framework
- 3.1What is predictive processing?
- 3.2Applying predictive processing to language: Solutions to the Now-or-Never bottleneck and some implications
- 4.Worries and future directions
- 5.Conclusion
- Acknowledgements
- Notes
References
References (101)
Adams, Rick A., Klaas Enno Stephan, Harriet R. Brown, Christopher D. Frith & Karl J. Friston. 2013. The computational anatomy of psychosis. Frontiers in Psychiatry 41. 47.
Adelson, Beth. 1984. When novices surpass experts: The difficulty of a task may increase with expertise. Journal of Experimental Psychology: Learning, Memory, and Cognition 10(3). 483–495.
Allen, Roy, Peter Mcgeorge, David Pearson & Alan B. Milne. 2004. Attention and expertise in multiple target tracking. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition 18(3). 337–347.
Barrett, Lisa Feldman & Moshe Bar. 2009. See it with feeling: Affective predictions during object perception. Philosophical Transactions of the Royal Society of London: Biological Sciences 364(1521). 1325–1334.
Barton, Stephen B. & Anthony J. Sanford. 1993. A case study of anomaly detection: Shallow semantic processing and cohesion establishment. Memory & Cognition 21(4). 477–487.
Bever, Thomas G. 1970. The cognitive basis for linguistic structures. Cognition and the Development of Language 279(362). 1–61.
Brown, Harriet & Karl J. Friston. 2012. Free-energy and illusions: The cornsweet effect. Frontiers in Psychology 31. 43.
Cantor, Alison D. & Elizabeth J. Marsh. 2017. Expertise effects in the Moses illusion: Detecting contradictions with stored knowledge. Memory 25(2). 220–230.
Castel, Alan D., David P. McCabe, Henry L. Roediger III & Jeffrey Heitman. 2007. The dark side of expertise: Domain-specific memory errors. Psychological Science 18(1). 3–5.
Che, Wanxiang & Yue Zhang. 2018. Deep learning in lexical analysis and parsing. Deep Learning in Natural Language, 79–116. Springer, Singapore.
Chi, Michelene T., Paul J. Feltovich & Robert Glaser. 1981. Categorization and representation of physics problems by experts and novices. Cognitive Science 5(2). 121–152.
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk & Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. (20 June, 2018.)
Christiansen, Morten & Nick Chater. 2008. Language as shaped by the brain. Behavioral and Brain Sciences 31(5). 489–509.
. 2016a. The Now-or-Never bottleneck: A fundamental constraint on language. Behavioral and Brain Sciences 391. 1–19.
. 2016b. Squeezing through the Now-or-Never bottleneck: Reconnecting language processing, acquisition, change, and structure. Behavioral and Brain Sciences 391. 46–58.
Churchland, Patricia S., Vilayanur S. Ramachandran & Terrence J. Sejnowski. 1994. A critique of pure vision In Christof Koch & Joel C. Davis (eds.), Large-scale Neuronal Theories of the Brain, 23–60. Cambridge: MIT Press.
Clark, Andy. 2013. Whatever next? Predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences 36(3). 181–204.
. 2015c. Surfing uncertainty: Prediction, action, and the embodied mind. New York, NY: Oxford University Press.
Cohen, Michael A., Daniel C. Dennett & Nancy Kanwisher. 2016. What is the bandwidth of perceptual experience? Trends in Cognitive Sciences 20(5). 324–335.
Colombo, Matteo & Stephan Hartmann. 2015. Bayesian cognitive science, unification, and explanation. The British Journal for the Philosophy of Science 68(2). 451–484.
Cowan, Nelson. 2001. The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences 24(1). 87–114.
. 2010. The Magical Mystery Four: How is Working Memory Capacity Limited, and Why? Current Directions in Psychological Science 19(1). 51–57.
DeLong, Katherine A., Thomas P. Urbach & Marta Kutas. 2005. Probabilistic word pre-activation during language comprehension inferred from electrical brain activity. Nature Neuroscience 8(8). 1117.
. 2017. Concerns with Nieuwland et al. multi-lab study (2017). Kutas Cognitive Electrophysiology Lab Working Paper. [URL]. (17 May 2018.)
Drenhaus, Heiner, Vera Demberg, Judith Köhne, J. & Francesca Delogu. 2014. Incremental and predictive discourse processing based on causal and concessive discourse markers: ERP studies on German and English. Annual Meeting of the Cognitive Science Society 36(36).
Elsabbagh, Mayada & Annette Karmiloff-Smith. 2006. Modularity of mind and language. The Encyclopaedia of Language and Linguistics, 218–224.
Ericsson, K. Anders, William G. Chase & Steve Faloon. 1980. Acquisition of a memory skill. Science 208(4448). 1181–1182.
Erickson, Thomas D. & Mark E. Mattson. 1981. From words to meaning: A semantic illusion. Journal of Verbal Learning and Verbal Behavior 20(5). 540–551.
Federmeier, Kara D. & Marta Kutas, M. 1999. A rose by any other name: Long-term memory structure and sentence processing. Journal of Memory and Language 41(4). 469–495.
Feldman, Harriet & Karl Friston. 2010. Attention, uncertainty, and free-energy. Frontiers in Human Neuroscience 41. 215.
Ferreira, Fernanda, Karl G. D. Bailey & Vittoria Ferraro. 2002. Good-enough representations in language comprehension. Current Directions in Psychological Science 11(1). 11–15.
Ferreira, Fernanda & Charles Clifton Jr. 1986. The independence of syntactic processing. Journal of Memory and Language 25(3). 348–368.
Filik, Ruth & Anthony J. Sanford. 2008. When is cataphoric reference recognized? Cognition 107(3). 1112–1121.
Fletcher, Paul C. & Chris. D. Frith. 2009. Perceiving is believing: A Bayesian approach to explaining the positive symptoms of schizophrenia. Nature Reviews Neuroscience 10(1). 48–58.
Friston, Karl. 2002. Beyond phrenology: What can neuroimaging tell us about distributed circuitry? Annual Review of Neuroscience 25(1). 221–250.
. 2005. A theory of cortical responses. Philosophical Transactions of the Royal Society of London: Biological Sciences 360(1456). 815–836.
Friston, Karl, Marco Lin, Chris D. Frith, Giovanni Pezzulo, J. Allan Hobson & Sasha Ondobaka 2017. Active inference, curiosity and insight. Neural Computation 29(10). 2633–2683.
Friston, Karl, Francesco Rigoli, Dmitri Ognibene, Christoph Mathys, Thomas Fitzgerald & Giovanni Pezzulo. 2015. Active inference and epistemic value. Cognitive Neuroscience 6(4). 187–214.
Giard, Marie H. & F. Péronnet 1999. Auditory-visual integration during multimodal object recognition in humans: A behavioral and electrophysiological study. Journal of Cognitive Neuroscience 11(5). 473–490.
Glorot, Xavier, Antoine Bordes & Yoshua Bengio. 2011. Domain adaptation for large-scale sentiment classification: A deep learning approach. 28th International Conference on Machine Learning (ICML-11), 513–520.
Gregory, Richard L. 1997. Knowledge in perception and illusion. Philosophical Transactions of the Royal Society: Biological Sciences 352(1358). 1121–1127.
Hashimoto, Kazuma, Caiming Xiong, Yoshimasa Tsuruoka & Richard Socher. 2016. A joint many-task model: Growing a neural network for multiple NLP tasks. arXiv preprint arXiv:1611.01587. (15 June, 2018.)
Heeger, David J. 2017. Theory of cortical function. National Academy of Sciences (NAS) 114(8). 1773–1782.
Heiser, Marc, Marco Iacoboni, Fumiko Maeda, Jake Marcus & John C. Mazziotta. 2003. The essential role of Broca’s area in imitation. European Journal of Neuroscience 17(5). 1123–1128.
Helmholtz, Hermann. 1860. Treatise on physiological optics (J. P. C. Southall, Trans. 1962 ed., Vol. 31). New York: Dover.
Hohwy, Jacob, Andreas Roepstorff & Karl Friston. 2008. Predictive coding explains binocular rivalry: An epistemological review. Cognition 108(3). 687–701.
Horga, Guillermo, Kelly C. Schatz, Anissa Abi-Dargham & Bradley S. Peterson. 2014. Deficits in Predictive Coding Underlie Hallucinations in Schizophrenia. The Journal of Neuroscience 34(24). 8072–8082.
Johnson, Kathy E. & Carolyn B. Mervis. 1997. Effects of varying levels of expertise on the basic level of categorization. Journal of Experimental Psychology: General 126(3). 248–277.
Karimi, Hossein & Fernanda Ferreira. 2016. Good-enough linguistic representations and online cognitive equilibrium in language processing. The Quarterly Journal of Experimental Psychology 69(5). 1013–1040.
Kempson, Ruth, Eleni Gregoromichelaki & Christine Howes. 2018. Language as an adaptive tool for interaction: A niche effect or a radical departure? Dynamic Syntax Workshop. 11. Edinburgh, UK.
Kirby, Simon, Kenny Smith & Hannah Cornish. 2008. Language, Learning and Cultural Evolution: How linguistic transmission leads to cumulative adaptation In Robin Cooper & Ruth Kempson (eds.), Language in Flux: Dialogue Coordination, Language Variation, Change and Evolution. London: College Publications.
Kiros, Ryan, Ruslan Salakhutdinov & Richard S. Zemel. 2014. Multimodal neural language models. Proceedings of the 31st International Conference on Machine Learning, PMLR 32(2). 595–603.
Köhne, Judith & Vera Demberg. 2013. The time-course of processing discourse connectives. Proceedings of the Annual Meeting of the Cognitive Science Society 351. Retrieved from [URL]. (20 June, 2018.)
Kowsari, Kamran, Donald E. Brown, Mojtaba Heidarysafa, Kiana Jafari Meimandi, Matthew S. Gerber & Laura E. Barnes. 2017. Hdltex: Hierarchical deep learning for text classification. Machine Learning and Applications (ICMLA), 364–371.
Kuperberg, Gina R. & T. Florian Jaeger. 2016. What do we mean by prediction in language comprehension? Language, Cognition and Neuroscience 31(1). 32–59.
Lee, Tai Sing. & David Mumford. 2003. Hierarchical Bayesian inference in the visual cortex. Journal of the Optical Society of America A, Optics, Image Science, and Vision 20(7). 1434–1448.
Liu, Jingzhou, Wei-Cheng Chang, Yuexin Wu & Yiming Yang. 2017. Deep learning for extreme multi-label text classification. 40th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM), 115–124.
Long, Mingsheng & Jianmin Wang. 2015. Learning multiple tasks with deep relationship networks. CoRR. 31.
Lotem, Arnon, Oleg Kolodny, Joseph Y. Halpern, Luca Onnis & Shimon Edelman. 2016. The bottleneck may be the solution, not the problem. Behavioral and Brain Sciences 391. 39–40.
MacDonald, John & Harry McGurk. 1978. Visual influences on speech perception processes. Perception & Psychophysics 24(3). 253–257.
MacDonald, Maryellen C., Neal J. Pearlmutter & Mark S. Seidenberg. 1994. The lexical nature of syntactic ambiguity resolution. Psychological Review 101(4). 676–703.
Marr, David. 1976. Early processing of visual information. Philosophical Transactions of the Royal Society of London: Biological Sciences 275(942). 483–519.
McDonald, Scott A. & Richard C. Shillcock. 2003. Eye movements reveal the on-line computation of lexical probabilities during reading. Psychological Science 14(6). 648–652.
Misra, Ishan, Abhinav Shrivastava, Abhinav Gupta & Martial Hebert. 2016. Cross-stitch networks for multi-task learning. IEEE Conference on Computer Vision and Pattern Recognition, 3994–4003.
Molholm, Sophie, Walter Ritter, Micah M. Murray, Daniel C. Javitt, Charles E. Schroeder & John J. Foxe. 2002. Multisensory auditory – visual interactions during early sensory processing in humans: A high-density electrical mapping study. Cognitive Brain Research 14(1). 115–128.
Ngiam, Jiquan, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee & Andrew Y. Ng. 2011. Multimodal deep learning. 28th international conference on machine learning (ICML-11), 689–696.
Nieuwland, Mante S., Stephen Politzer-Ahles, Evelien Heyselaar, Katrien Segaert, Emily Darley, Nina Kazanina, Sarah von Grebmer zu Wolfsthurn et al. 2017. Limits on prediction in language comprehension: A multi-lab failure to replicate evidence for probabilistic pre-activation of phonology. BioRxiv. (05 July 2018.)
Orlandi, Nico & Lee Geoff. 2019. How Radical is Predictive Processing? In Matteo Colombo, Elizabeth Irvine & Mog Stapleton (eds.), Andy Clark and his Critics. 206–219. New York, NY: Oxford University Press.
Papathomas, Thomas V. 2017. The Hollow-Mask Illusion and Variations. In Arthur G. Shapiro & Dejan Todorović (eds.), The Oxford Compendium of Visual Illusions. 614–619. New York, NY: Oxford University Press.
Pashler, Harold. 1988. Familiarity and visual change detection. Perception & Psychophysics 44(4). 369–378.
Pellicano, Elizabeth & David Burr. 2012. When the world becomes ‘too real’: A Bayesian explanation of autistic perception. Trends in Cognitive Sciences 16(10). 504–510.
Pezzulo, Giovanni. 2014. Why do you fear the bogeyman? An embodied predictive coding model of perceptual inference. Cognitive, Affective, & Behavioral Neuroscience 14(3). 902–911.
Rao, Rajesh P. & Dana H. Ballard. 1999. Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience 2(1). 79.
Reed, Scott, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele & Honglak Lee. 2016. Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396.
Remez, Robert E., Daria F. Ferro, Kathryn R. Dubowski, Judith Meer, Robin S. Broder & Morgana L. Davids. 2010. Is desynchrony tolerance adaptable in the perceptual organization of speech? Attention, Perception, & Psychophysics 72(8). 2054–2058.
Rohde, Hannah & William S. Horton. 2014. Anticipatory looks reveal expectations about discourse relations. Cognition 133(3). 667–691.
Roy, Deb & Niloy Mukherjee. 2005. Towards situated speech understanding: Visual context priming of language models. Computer Speech & Language 19(2). 227–248.
Shallice, Tim. 1988. From neuropsychology to mental structure. New York, NY: Cambridge University Press.
Simons, Daniel J. & Christopher F. Chabris. 1999. Gorillas in our midst: Sustained inattentional blindness for dynamic events. Perception 28(9). 1059–1074.
Slattery, Timothy J., Patrick Sturt, Kiel Christianson, Masaya Yoshida & Fernanda Ferreira. 2013. Lingering misinterpretations of garden path sentences arise from competing syntactic representations. Journal of Memory and Language 69(2). 104–120.
Sperber, Dan. 2002. In defence of massive modularity. In Emmanuel Dupoux (ed.), Language, Brain and Cognitive Development: Essays in Honor of Jacques Mehler, 47–57. Cambridge, Mass.: MIT Press.
Spratling, Michael W. 2008. Reconciling predictive coding and biased competition models of cortical function. Frontiers in Computational Neuroscience 2. 4.
Stephan, Klaas E., Zina M. Manjaly, Christoph D. Mathys, Lilian A. Weber, Saee Paliwal, Tim Gard, Marc Tittgemeyer et al. 2016. Allostatic self-efficacy: A metacognitive theory of dyshomeostasis-induced fatigue and depression. Frontiers in Human Neuroscience 101. 550.
Sutskever, Ilya, Oriol Vinyals & Quoc V. Le. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 271. 3104–3112.
Talsma, Durk. 2015. Predictive coding and multisensory integration: An attentional account of the multisensory mind. Frontiers in Integrative Neuroscience 91. 19.
Taylor, John R. 2012. The mental corpus: How language is represented in the mind. Oxford University Press.
Traxler, Matthew J. 2012. Introduction to psycholinguistics understanding language science. Chichester, UK, Malden, Mass.: Wiley-Blackwell.
Van Oostendorp, Herre & Sjaak De Mul, S. 1990. Moses beats Adam: A semantic relatedness effect on a semantic illusion. Acta Psychologica 74(1). 35–46.
Vervaeke, John, Timothy P. Lillicrap & Blake A. Richards 2012. Relevance realization and the emerging framework in cognitive science. Journal of Logic and Computation 22(1). 79–99.
Wiese, Wanja. 2018. Experienced Wholeness: Integrating Insights from Gestalt Theory, Cognitive Neuroscience, and Predictive Processing. Cambridge, Mass.: MIT Press.
Wu, Yonghui, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun et al. 2016. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
Cited by (4)
Cited by four other publications
Vicente, Agustín, Christian Michel & Valentina Petrolini
Mackenzie, J. Lachlan
Löhr, Guido & Christian Michel
This list is based on CrossRef data as of 29 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
