Article In: Functions of Language: Online-First Articles
A cross-linguistic computational study on one new idea per clause
This content is being prepared for publication; it may be subject to changes.
Abstract
We investigate a notion related to Chafe’s One New Idea Constraint (ONICON), namely that clauses only contain one
new idea at a time. We investigate both the nominal and the verbal domain in discourse production data from 16 diverse languages.
We find no convincing evidence for a hard constraint on the number of new ideas per clause, nor for speakers actively managing
information flow in discourse. The absence of an underlying ONICON is suggested by (1) the fact that NPs co-occurring in a single
clause do not have more accessible referents, (2) that observed distributions of information can likewise be reproduced by a
randomised mechanism, and (3) the absence of a presentational referent activation pattern. A more likely account of observed
discourse production pattern is producers’ overall goal to be both contentful and coherent, the former implying the mentioning of
a considerable number of entities and the latter implying predications about these entities. It is in this sense that there is
audience design, rather than in an avoidance of information overcharge.
Article outline
- 1.Introduction
- 2.The one-new-idea pattern and underlying constraints
- 3.Study design
- 3.1Corpus data and feature extraction
- 3.2Measuring information throughput in production data
- 3.3Measuring active information management
- 3.4Hypotheses
- 4.Results
- 4.1Nominal new ideas
- 4.2Verbal new ideas
- 4.3Information management
- 5.Discussion
- Author queries
References
References (65)
Arnold, Jennifer E. 2010. How speakers refer: The role of
accessibility. Language and Linguistics
Compass 4(4). 187–203.
Barth, Danielle, Kira Davey & Maria Matheas. 2023. Multi-CAST
Matukar Panau. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Bogomolova, Natalia, Dmitry Ganenkov & Nils N. Schiborr 2021. Multi-CAST
Tabasaran. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Brennan, Susan E. 1995. Centering attention in
discourse. Language and Cognitive
Processes 10(2). 137–167.
Chafe, Wallace. 1994. Discourse,
consciousness, and time: The flow and displacement of conscious experience in speaking and
writing. Chicago, IL: University of Chicago Press.
Chafe, Wallace L. 1987. Cognitive constraints on
information flow. In Russell S. Tomlin (ed.), Coherence
and grounding in
discourse, 21–51. Amsterdam: John Benjamins.
Christiansen, Morten H. & Nick Chater. 2016. The
now-or-never bottleneck: A fundamental constraint on language. Behavioral and Brain
Sciences 391. e62.
Clark, Herbert H. & Gregory L. Murphy. Audience design
in meaning and reference. In Jean-François Le Ny & Walter Kintsch (eds.), Advances
in
psychology, 287–299. Amsterdam: North-Holland.
Comrie, Bernard, Martin Haspelmath & Balthasar Bickel. The
Leipzig glossing rules: Conventions for interlinear morpheme-by-morpheme
glosses. Leipzig: Max Planck Institute for Evolutionary Anthropology.
Croft, William. 2007. Intonation
units and grammatical structure in Wardaman and in cross-linguistic perspective. Australian
Journal of
Linguistics 27(1). 1–39.
2003. Discourse and
grammar. In Michael Tomasello (ed.), The
new psychology of language: Cognitive and functional approaches to language
structure, 47–88. Mahwah, MA: Lawrence Erlbaum.
Egurtzegi, Aitor, Damián E. Blasi, Sebastian Sauppe, Balthasar Bickel & Stefan Schnell. 2025. Discourse
ergativity and human reference in Basque. Studies in
Language 49(3). 682–711.
Evans, Nicholas & Stephen C. Levinson. 2009. The
myth of language universals: Language diversity and its importance for cognitive
science. Behavioral and Brain
Sciences 32(5). 429–448.
Forker, Diana & Nils N. Schiborr. 2019. Multi-CAST
Sanzhi Dargwa. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Ganenkov, Dmitry & Nils N. Schiborr. 2025. Multi-CAST
Chirag. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2507. Bamberg: University of Bamberg.
Givón, Talmy. 1975. Focus
and scope of assertion: Some Bantu evidence. Studies in African
Linguistics 6(2). 185–205.
. 1983. Topic
continuity in discourse: An introduction. In Talmy Givón (ed.), Topic
continuity in
discourse, 1–42. Amsterdam: John Benjamins.
. 1984. Syntax:
A functional-typological
introduction (Vol. 21). Amsterdam: John Benjamins.
Goldberg, Adele E. 2004. Discourse and argument
structure. In Laurence R. Horn & Gregory Ward (eds.), The
handbook of pragmatics, 427–441. Malden, MA: Blackwell.
Gundel, Jeanette K., Nancy Hedberg & Ron Zacharski. 1993. Cognitive
status and the form of referring expressions in
discourse. Language 69(2). 274–307.
Hadjidas, Harris & Maria Vollmer. 2015. Multi-CAST
Cypriot Greek. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Haig, Geoffrey & Stefan Schnell. 2014. Annotations
using GRAID (Grammatical relations and animacy in discourse): Introduction and guidelines for annotators (version
7.0). Bamberg: University of Bamberg.
. 2023. Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Haig, Geoffrey, Stefan Schnell & Nils Norman Schiborr. 2021. Universals
of reference in discourse and grammar: Evidence from the Multi-CAST collection of spoken
corpora. In Geoffrey Haig, Stefan Schnell & Frank Seifart (eds.), Doing
corpus-based typology with spoken language corpora: State of the
art, 141–177. Honolulu, HI: University of Hawai’i Press.
Haig, Geoffrey, Maria Vollmer & Hanna Thiele. 2019. Multi-CAST
Northern Kurdish. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Himmelmann, Nikolaus P. 2022. Prosodic phrasing and the
emergence of phrase
structure. Linguistics 60(3). 715–743.
Himmelmann, Nikolaus P., Meytal Sandler, Jan Strunk & Volker Unterladstetter. 2018. On
the universality of intonational phrases: A cross-linguistic interrater
study. Phonology 35(2). 207–245.
Hymes, Dell. 1972. On
communicative competence. In John B. Pride & Janet Holmes (eds.), Sociolinguistics:
Selected
readings, 269–293. Harmondsworth: Penguin.
Inbar, Maya, Shir Genzer, Anat Perry, Eitan Grossman & Ayelet N. Landau. 2023. Intonation
units in spontaneous speech evoke a neural response. Journal of
Neuroscience 43(48). 8189–8200.
Jaeger, T. Florian. 2010. Redundancy and reduction:
Speakers manage syntactic information density. Cognitive
Psychology 61(1). 23–62.
Karttunen, Lauri. 1976. Discourse
referents. In James D. McCawley (ed.), Notes
from the linguistic underground, 363–385. New York, NY: Academic Press.
Konopka, Agnieszka E. 2012. Planning ahead: How recent
experience with structures and words changes the scope of linguistic planning. Journal of
Memory and
Language 66(1). 143–162.
Kurabe, Keita. 2021. Multi-CAST
Jinghpaw. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Labov, William & Joshua Waletzky. 1967. Narrative
analysis: Oral versions of personal experience. In June Helm (ed.), Proceedings
of the 1966 Annual Spring Meeting of the American Ethnological
Society, 12–44. Seattle, WA: University of Washington Press.
Lambrecht, Knud. 1994. Information
structure and sentence form: Topic, focus, and the mental representations of discourse
referents. Cambridge: CUP.
Levy, Roger & T. Florian Jaeger. 2007. Speakers
optimize information density through syntactic reduction. In Bernhard Schölkopf, John Platt & Thomas Hofmann (eds.), Advances
in neural information processing systems 19: Proceedings of the 2006
conference, 849–856. Cambridge, MA: MIT Press.
Linders, Guido M. & Stefan Schnell. 2025. Functional
factors predict referential choice similarly across languages: A cross-linguistic computational
analysis. PsyArXiv. 23p4m_v1.
MacDonald, Maryellen C. 2013. How language production shapes
language form and comprehension. Frontiers in
Psychology 4(226). 1–16.
Meng, Chenxi. 2019. Multi-CAST
Tulil. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Mosel, Ulrike & Stefan Schnell. 2015. Multi-CAST
Teop. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Pawley, Andrew & Frances H. Syder. 2000. The
one-clause-at-a-time hypothesis. In Heidi Riggenbach (ed.), Perspectives
on fluency, 163–199. Ann Arbor, MI: University of Michigan Press.
Peck, Naomi & Laura Becker. 2024. Syntactic
pausing? Re-examining the associations. Linguistics
Vanguard 10(1). 223–237.
Polanyi, Livia. 2005. The
linguistic structure of discourse. In Deborah Schiffrin, Deborah Tannen & Heidi E. Hamilton (eds.), The
handbook of discourse analysis, 265–281. Malden, MA: Blackwell.
R Core Team. 2025. R: A language and
environment for statistical computing. R Foundation for Statistical Computing. [URL]
Schiborr, Nils N. 2015. Multi-CAST
English. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
2018. MulticastR: A companion to the
Multi-CAST collection. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken
texts. Bamberg: University of Bamberg.
2023. Lexical anaphora. A corpus-based
typological study of referential
choice. Bamberg: University of Bamberg Press.
Schiborr, Nils N., Stefan Schnell & Hanna Thiele. 2018. RefIND
— Referent indexing in natural-language discourse: Annotation guidelines
(v1.1). Bamberg: University of Bamberg.
Schnell, Stefan. 2015. Multi-CAST
Vera’a. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Schnell, Stefan, Nils N. Schiborr & Geoffrey Haig. 2021. Efficiency
in discourse processing: Does morphosyntax adapt to accommodate new referents? Linguistics
Vanguard 7(s3). 20190064.
Seifart, Frank & Tai Hong. 2022. Multi-CAST
Bora. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Seifart, Frank, Jan Strunk, Swintha Danielsen, Iren Hartmann, Brigitte Pakendorf, Søren Wichmann, Alena Witzlack-Makarevich, Nivja H. de Jong & Balthasar Bickel. 2018. Nouns
slow down speech across structurally and culturally diverse languages. Proceedings of the
National Academy of
Sciences 115(22). 5720–5725.
Shannon, Claude E. 1948. A mathematical theory of
communication. The Bell System Technical
Journal 27(3). 379–423.
Shiohara, Asako. 2022. Multi-CAST
Sumbawa. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Smith, Mark & Linda Wheeldon. 1999. High
level processing scope in spoken sentence
production. Cognition 73(3). 205–246.
Thieberger, Nick & Timothy Brickell. 2019. Multi-CAST
Nafsan. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.
Thompson, Chad L. 1990. On the treatment of topical
objects in Chepang: Passive or inverse? Studies in Language. International Journal sponsored by
the Foundation “Foundations of
Language” 14(2). 405–427.
Tily, Harry & Steven Piantadosi. 2009. Refer
efficiently: Use less informative expressions for more predictable
meanings. In Kees van Deemter, Albert Gatt, Roger P. van Gompel & Emiel J. Krahmer (eds.), Proceedings
of the Workshop on the Production of Referring Expressions: Bridging the gap between computational and empirical approaches to
reference (PRE-CogSci
2009), 1–8. Tilburg: Tilburg University.
Venhuizen, Noortje J. & Harm Brouwer. 2025. Referential
retrieval and integration in language comprehension: An electrophysiological
perspective. Psychological Review. Published online.
Visser, Eline. 2021. Multi-CAST
Kalamang. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST:
Multilingual corpus of annotated spoken texts. Version
2311. Bamberg: University of Bamberg.