In:The Documentarist Turn: From observable linguistic behaviour to typological generalizations
Edited by Sonja Riesberg, Uta Reinöhl and Birgit Hellwig
[Studies in Language Companion Series 240] 2026
► pp. 376–396
Chapter 15Pauses, parts of speech, and word order
A comparative corpus study on 27 languages
This content is being prepared for publication; it may be subject to changes.
Abstract
This study investigates how pauses before nouns and verbs interact with word order and
prosodic structure across a worldwide sample of 27 languages from the DoReCo corpus. Whereas prior research
has suggested that more pauses occur before nouns in general, the results here show that this asymmetry holds
consistently only in verb-final languages, while in other languages, pausing patterns are mixed. Since word
order mediates pausing asymmetries, it likely also plays a major role in explaining prefixing asymmetries — namely, the global dispreference for prefixes on nouns compared to verbs. The study demonstrates the potential
of comparative corpus linguistics for examining typological patterns in speech, especially when incorporating
prosodic annotation.
Article outline
- 1.Introduction
- 2.Data and annotation
- 3.Raw distribution of pauses before nouns and verbs
- 4.Statistical analyses
- 5.Discussion
- 6.Conclusion
- Data availability
Acknowledgements References Appendix
References (59)
Arnold, Jennifer E., L. Hudson, and Michael K. Tanenhaus. 2007. “If
You Say thee uh You Are Describing Something Hard: The On-line Attribution of Disfluency During
Reference Comprehension.” Journal of Experimental Psychology: Learning,
Memory, and
Cognition 33 (5): 914–930.
Avanzi, Mathieu, Marie-José Béguelin, Gilles Corminboeuf, Federica Diémoz, and Laure Anne Johnsen. 2022. “French
(Swiss) DoReCo Dataset.” In Seifart et al. (2022).
Bardají i Farré, Maria, Christoph Bracks, Claudia Leto, Datra Hasan, Sonja Riesberg, Winarno S. Alamudi, and Nikolaus P. Himmelmann. 2024. “Totoli
DoReCo Dataset.” In Language Documentation
Reference Corpus (DoReCo) 2.0, ed. by Frank Seifart, Ludger Paschen, and Matthew Stave. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon
2).
Beattie, Geoffrey W., and B. L. Butterworth. 1979. “Contextual
Probability and Word Frequency as Determinants of Pauses and Errors in Spontaneous
Speech.” Language and
Speech 22 (3): 201–211.
Boersma, Paul, and David Weenink. 2025. Praat:
Doing Phonetics by Computer [Computer
program]. Version 6.4.27. [URL]. (28 March 2025).
Bogomolova, Natalia, Dmitry Ganenkov, and Nils Norman Schiborr. 2022. “Tabasaran
DoReCo Dataset.” In Seifart et al. (2022).
Bürkner, Paul-Christian. 2017. “brms:
An R Package for Bayesian Multilevel Models Using Stan.” Journal of
Statistical
Software 80 (1): 1–28.
Bybee, Joan L., William Pagliuca, and Revere D. Perkins. 1990. “On
the Asymmetries in the Affixation of Grammatical
Material.” In Studies in Typology and
Diachrony: Papers Presented to Joseph H. Greenberg on his 75th birthday, ed.
by William A. Croft, Suzanne Kemmer, and Keith Denning, 1–42. Amsterdam: John Benjamins.
Clark, Herbert H., and Jean E. Fox Tree. 2002. “Using
uh and um in Spontaneous
Speaking.” Cognition 84 (1): 73–111.
Cutler, Anne, John A. Hawkins, and Gary Gilligan. 1985. “The
Suffixing Preference: A Processing
Explanation.” Linguistics 23 (5): 723–758.
Däbritz, Chris Lasse, Nina Kudryakova, Eugénie Stapert, and Alexandre Arkhipov. 2022. “Dolgan
DoReCo Dataset.” In Seifart et al. (2022).
Döhler, Christian. 2022. “Komnzo
DoReCo Dataset”. In Language Documentation
Reference Corpus (DoReCo) 1.2, ed. by Frank Seifart, Ludger Paschen, and Matthew Stave. Berlin/Lyon: Leibniz-Zentrum Allgemeine Sprachwissenschaft & Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon
2).
Donnelly, Seamus, Caroline Rowland, Franklin Chang, and Evan Kidd. 2024. “A
Comprehensive Examination of Prediction-Based Error as a Mechanism for Syntactic Development: Evidence
from Syntactic Priming.” Cognitive
Science 48 (4): e13431.
Dryer, Matthew S. 2013. “Order
of Subject, Object and Verb.” In The World
Atlas of Language Structures Online, ed. by Matthew S. Dryer, and Martin Haspelmath. Leipzig: Max Planck Institute for Evolutionary Anthropology. [URL]. (30 December
2022).
Forker, Diana, and Nils Norman Schiborr. 2022. “Sanzhi
Dargwa DoReCo Dataset.” In Seifart et al. (2022).
Güldemann, Tom, Martina Ernszt, Sven Siegmund, and Alena Witzlack-Makarevich. 2022. „Nǁng
DoReCo Dataset.“ In Seifart et al. (2022).
Gusev, Valentin, Tiina Klooster, Beáta Wagner-Nagy, and Alexandre Arkhipov. 2022. “Kamas
DoReCo Dataset.” In Seifart et al. (2022).
Haig, Geoff, Maria Vollmer, and Hanna Thiele. 2022. “Northern
Kurdish (Kurmanji) DoReCo Dataset.” In Seifart et al. (2022).
Hammarström, Harald. 2016. “Linguistic
Diversity and Language Evolution.” Journal of Language
Evolution 1 (1): 19–29.
. 2021. “Measuring
Prefixation and Suffixation in the Languages of the
World.” In Proceedings of the Third Workshop
on Computational Typology and Multilingual NLP, ed.
by Ekaterina Vylomova, Elizabeth Salesky, Sabrina Mielke, Gabriella Lapesa, Ritesh Kumar, Harald Hammarström, Ivan Vulić, Anna Korhonen, Roi Reichart, Edoardo Maria Ponti, and Ryan Cotterell, 81–89. Online: Association for Computational Linguistics.
Hammarström, Harald, Robert Forkel, Martin Haspelmath, and Sebastian Bank. 2024. Glottolog
5.1. Leipzig.
Hieke, Adolf E., Sabine Kowal, and Daniel C. O’Connell. 1983. “The
Trouble with ‘Articulatory’ Pauses.” Language and
Speech 26 (3): 203–214.
Himmelmann, Nikolaus P. 1998. “Documentary
and Descriptive
Linguistics.” Linguistics 36 (1): 161–195.
2014. “Asymmetries
in the Prosodic Phrasing of Function Words: Another Look at the Suffixing
Preference.” Language 90 (4): 927–960.
Himmelmann, Nikolaus P., and D. Robert Ladd. 2008. “Prosodic
Description: An Introduction for Fieldworkers.” Language Documentation
&
Conservation 2 (2): 244–274. [URL]
Himmelmann, Nikolaus P., Meytal Sandler, Jan Strunk, and Volker Unterladstetter. 2018. “On
the Universality of Intonational Phrases: A Cross-linguistic Interrater
Study.” Phonology 35 (2): 207–245.
Inbar, Maya, Eitan Grossman & Ayelet N. Landau. 2025. “A
Universal of Speech Timing: Intonation Units Form Low-Frequency
Rhythms.” Proceedings of the National Academy of
Sciences 122 (34): e2425166122.
Levelt, Willem J. M. 1989. Speaking:
From Intention to Articulation. Cambridge, Mass., and London: MIT Press.
Levelt, Willem J. M., Ardi Roelofs, and Antje S. Meyer. 1999. “A
Theory of Lexical Access in Speech Production.” Behavioral and Brain
Sciences 22 (1): 1–38.
Paschen, Ludger, François Delafontaine, Christoph Draxler, Susanne Fuchs, Matthew Stave, and Frank Seifart. 2020. “Building
a Time-Aligned Cross-Linguistic Reference Corpus from Language Documentation Data
(DoReCo).” In Proceedings of the 12th
Language Resources and Evaluation Conference, ed. By Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 2657–2666. Marseille, France: European Language Resources Association. [URL]
Peck, Naomi, and Laura Becker. 2024. “Syntactic
Pausing? Re-Examining the Associations.” Linguistics
Vanguard 10 (1): 223–237.
Peck, Naomi, Kirsten Culhane, and Maria Vollmer. 2021. “Comparing
Cues: A Mixed Methods Study of Intonation Unit Boundaries in Three Typologically Diverse
Languages.” Presentation, AG 10a Prosodic
Boundary Phenomena at the 43rd Annual Conference of the German Linguistic
Society, Freiburg, 23–26 February
2021.
R Core
Team. 2024. R: A Language and Environment for Statistical
Computing. Vienna, Austria: R Foundation for Statistical Computing. [URL]
Schnell, Stefan, and Nils Norman Schiborr. 2022. “Crosslinguistic
Corpus Studies in Linguistic Typology.” Annual Review of
Linguistics 8: 171–191.
Seifart, Frank, Ludger Paschen, and Matthew Stave (eds.). 2022. Language
Documentation Reference Corpus (DoReCo)
1.2. Berlin/Lyon: Leibniz-Zentrum Allgemeine Sprachwissenschaft & Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon
2).
(eds.). 2024. Language
Documentation Reference Corpus (DoReCo)
2.0. Lyon: Laboratoire Dynamique Du Langage (UMR5596, CNRS & Université Lyon
2).
Seifart, Frank, Jan Strunk, and Balthasar Bickel. 2017. “Recurrent
Patterns in the Distribution of Speech Pauses Cause Languages to Develop more Prefixes in Verbs than
in Nouns.” Presentation, 12th Meeting of
the Association for Linguistic Typology, Canberra,
Australia, 14 December 2017.
Seifart, Frank, Jan Strunk, Swintha Danielsen, Iren Hartmann, Brigitte Pakendorf, Søren Wichmann, Alena Witzlack-Makarevich, Nivja H. de Jong, and Balthasar Bickel. 2018. “Nouns
Slow Down Speech cross Structurally and Culturally Diverse
Languages.” Proceedings of the National Academy of Sciences of the
United States of
America 115 (22): 5720–5725.
Skopeteas, Stavros, Violeta Moisidi, Nutsa Tsetereli, Johanna Lorenz, and Stefanie Schröter. 2022. “Urum
DoReCo Dataset.” In Seifart et al. (2022).
