Article published In: Studies in Language
Vol. 48:2 (2024) ► pp.351–389
Comparing zero and referential choice in eight languages with a focus on Mandarin Chinese
Published online: 1 September 2023
https://doi.org/10.1075/sl.21072.vol
https://doi.org/10.1075/sl.21072.vol
Abstract
Mandarin has a low rate of overtly expressed arguments in all syntactic functions without agreement marking on the verb. It has been claimed that Mandarin exhibits higher rates of zero arguments than other languages. Most previous work has compared Mandarin with English, while comparison with other languages remains a desideratum. This study compares Mandarin with seven languages (Cypriot Greek, English, Northern Kurdish, Sanzhi, Teop, Tondano, Vera’a) taken from Multi-CAST (. 2019. Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).). Results suggest that while Mandarin exhibits more zero arguments than pronouns, this is not unique, with e.g. Cypriot Greek having a higher rate of zero arguments. In addition, a relatively stable rate of lexical expressions can be found across languages, relativising Mandarin’s unique position with regard to referential choice even further.
Article outline
- 1.Is Mandarin discourse less explicit?
- 2.Zero arguments and verb agreement
- 3.Analysing zero arguments using Multi-CAST
- 3.1Multi-CAST: Multilingual corpus of annotated spoken texts
- 3.2Overview of the languages
- 3.2.1Cypriot Greek
- 3.2.2English
- 3.2.3Northern Kurdish
- 3.2.4Sanzhi
- 3.2.5Teop
- 3.2.6Tondano
- 3.2.7Vera’a
- 3.2.8Mandarin
- 3.3Comparability of the eight languages
- 4.Coding of the Mandarin corpus
- 5.Results of the usage-based analysis
- 5.1Rates of zero arguments across languages
- 5.2Finer-grained analysis of underlying causes
- 5.2.1Preliminary nature of this analysis
- 5.2.2Included variables and languages
- 5.2.3What decision trees are and how they are computed
- 5.2.4Preliminary results and discussion
- 6.Conclusion
- Data availability
- Acknowledgements
- Notes
- Abbreviations
References
References (105)
Ackema, Peter & Ad Neeleman. 2007. Restricted pro drop in Early Modern Dutch. The Journal of Comparative Germanic Linguistics 10(81). 81–107.
Ackema, Peter, Patrick Brandt, Maaike Schoorlemmer & Fred Weerman. 2006. The role of agreement in the expression of arguments. In Peter Ackema, Patrick Brandt, Maaike Schoorlemmer & Fred Weerman (eds.), Arguments and agreement, 1–32. Oxford: Oxford University Press.
Adams, Marianne. 1987. From Old French to the theory of pro-drop. Natural Language and Linguistic Theory 5(1). 1–32.
Agouraki, Yoryia. 2010. It-clefts and stressed operators in the preverbal field of Cypriot Greek. Lingua 1201, 527–554.
. 1996. Referring expressions and the +/− coreference distinction. In Thorstein Fretheim & Jeanette K. Gundel (eds.), Reference and referent accessibility, 13–35. Amsterdam: John Benjamins.
. 2011a. Pro-drop and theories of pro in the minimalist program part 1: Consistent null subject languages and the pronominalagr hypothesis. Language and Linguistics Compass 5(8). 551–570.
. 2011b. Pro-drop and theories of pro in the minimalist program part 2: Pronoun deletion analyses of null subjects and partial, discourse and semi pro-drop. Language and Linguistics Compass 5(8), 571–587.
Battistella, Edwin. 1985. On the distribution of pro in Chinese. Natural Language and Linguistic Theory 3(3). 317–340.
Bennis, Hans. 2006. Agreement, pro, and imperatives. In Peter Ackema, Patrick Brandt, Maaike Schoorlemmer & Fred Weerman (eds.), Arguments and agreement, 101–123. Oxford: Oxford University Press.
Bickel, Balthasar. 2003. Referential density in discourse and syntactic typology. Language 79(4). 708–736.
Bisang, Walter. 2009. On the evolution of complexity: Sometimes less is more in East and mainland Southeast Asia. In Geoffrey Sampson, David Gil & Peter Trudgill (eds.), Language complexity as an evolving variable, 34–49. Oxford: Oxford University Press.
. 2014. Overt and hidden complexity: Two types of complexity and their implications. Poznan Studies in Contemporary Linguistics 50(2). 127–143.
. 2015. Hidden complexity: the neglected side of complexity and its implications. Linguistics Vanguard 1(1). 177–187.
Brickell, Timothy C. 2014. A grammatical description of the Tondano (Toundano) language. Melbourne: La Trobe University PhD dissertation.
2016. Multi-CAST Tondano. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
2019. Multi-CAST Tondano annotation notes. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
Brickell, Timothy C. & Stefan Schnell. 2017. Do grammatical relations reflect information status? Reassessing preferred argument structure theory against discourse data from Tondano. Linguistic Typology 21(1). 177–208.
Brugman, Hennie & Albert Russel. 2004. Annotating multimedia/multi-modal resources with ELAN. In Maria Teresa Lino, Maria Francisca Xavier, Fátima Ferreira, Rute Costa & Raquel Silva (eds.), Proceedings of LREC 2004, Fourth International Conference on Languages Resources and Evaluation, Lisbon, Portugal. 2065–2068.
Butt, Miriam. 2001. Case, agreement, pronoun incorporation and pro-drop in South Asian languages. Paper presented at the Workshop The Role of Agreement in Argument Structure, Utrecht, August 31–September 1, 2001.
Bynon, Theodora. 1979. The ergative construction in Kurdish. Bulletin of the School of Oriental and African Studies 42(2). 211–224.
Cacoullos, Rena Torres & Catherine E. Travis. 2019. Variationist typology: shared probabilistic constraints across (non-)null subject languages. Linguistics 57(3). 653–692.
Chambaz, Antoine & Guillaume Desagulier. 2016. Predicting is not explaining: Targeted learning of the dative alternation. Journal of Causal Inference 4(1). 1–30.
Chen, Ping. 1999. Modern Chinese: History and sociolinguistics. Cambridge: Cambridge University Press.
Chen, Rong. 1995. Communicative dynamism and word order in Mandarin Chinese. Language Sciences 17(2), 201–222.
Creissels, Denis. 2005. A typology of subject and object markers in African languages. In Voeltz, Erhard (ed.), Studies in African linguistic typology [Typological Studies in Language 64], 43–70. Amsterdam: John Benjamins.
Dowle, Matt & Arun Srinivasan. 2019. data.table: Extension of ‘data.frame’. R package version 1.12.2, [URL] (last access 19 July 2023).
ELAN (Version 6.3) [Computer software]. 2022. Nijmegen: Max Planck Institute for Psycholinguistics, The Language Archive. Retrieved from [URL] (last access 19 July 2023).
English Dialects Research Group. 2005. Freiburg English Dialect Corpus (FRED) ([URL]) (last access 19 July 2023).
Foley, William. 2010. Events and serial verb constructions. In Mengistu Amberber, Brett Baker & Mark Harvey (eds.), Complex predicates: Cross-linguistic perspectives on event structure, 79–109. Cambridge: Cambridge University Press.
Forker, Diana. 2020. A grammar of Sanzhi Dargwa [Languages of the Caucasus 2]. Berlin: Language Science Press.
Forker, Diana & Nils N. Schiborr. 2019. Multi-CAST Sanzhi Dargwa. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
Hadjidas, Harris & Maria Vollmer. 2015. Multi-CAST Cypriot Greek. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
Haig, Geoffrey. 2008. Alignment change in Iranian languages: a construction grammar approach. [Empirical Approaches to Language Typology 37]. Berlin: Mouton de Gruyter.
. 2017. Deconstructing Iranian ergativity. In Jessica Coon, Diane Massam & Lisa Demena Travis (eds.), The Oxford handbook of ergativity, 465–500. Oxford: Oxford University Press.
. 2018a. Northern Kurdish (Kurmanji). In Geoffrey Haig & Geoffrey Khan (eds.), The languages and linguistics of Western Asia: An areal perspective, 106–158. Berlin/Boston: Mouton de Gruyter.
. 2018b. Linguistik des Kurdischen. In Ludwig Paul (ed.), Handbuch der Iranistik, vol. 21, 291–297. Wiesbaden: Harrassowitz.
Haig, Geoffrey & Stefan Schnell. 2014. Annotations using GRAID (Grammatical Relations and Animacy in Discourse). Manual Version 7.0. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
. 2019. Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
Haig, Geoffrey & Hanna Thiele. 2016. Multi-CAST Northern Kurdish metadata sheet. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (19 July 2023).
Haig, Geoffrey, Nils N. Schiborr & Stefan Schnell. 2020. On potential statistical universals of grammar in discourse: Evidence from Multi-CAST. Paper presented at the Workshop Corpus-based typology: Spoken language from a cross-linguistic perspective, as part of the 42nd Annual Conference of the German Linguistic Society [DGfS], Hamburg, Germany, 4–6 March 2020.
Haig, Geoffrey, Stefan Schnell & Nils N. Schiborr. 2021. Universals of reference in discourse and grammar: Evidence from the Multi-CAST collection of spoken corpora. In Geoffrey Haig, Stefan Schnell & Frank Seifart (eds.), Doing corpus-based typology with spoken language corpora, 141–177. Honolulu, HI: University of Hawai’i Press.
Haig, Geoffrey, Maria Vollmer & Hanna Thiele. 2019a. Multi-CAST Northern Kurdish. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
. 2019b. Multi-CAST Northern Kurdish annotation notes. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
Hammarström, Harald, Robert Forkel, Martin Haspelmath & Sebastian Bank. 2020. Glottolog 4.3. Jena: Max Planck Institute for the Science of Human History ([URL]) (last access 19 July 2023).
Harrell, Franke E. Jr. 2019. rms: Regression modeling strategies. R package version 5.1–3, [URL] (last access 19 July 2023).
Hill, Clair. 2018. Person reference and interaction in Umpila/Kuuku Ya’u narrative [MPI Series in Psycholinguistics 141]. Radboud: Radboud University PhD dissertation.
Holmberg, Anders & Ian Roberts. 2013. The syntax-morphology relation. Lingua Special Issue Syntax and Cognition: Core Ideas and Results in Syntax 1301. 111–131.
Holmberg, Anders, Aarti Nayudu & Michelle Sheehan. 2009. Three partial null-subject languages: A comparison of Brazilian Portuguese, Finnish, and Marathi. Studia Linguistica: Special Issue: Partial Pro-Drop 63(1). 59–97.
Holmberg, Anders. 2005. Is there a little pro? Evidence from Finnish. Linguistic Inquiry 36(4). 533–564.
Huang, James. 1984. On the distribution and reference of empty pronouns. Linguistic Inquiry 15(4). 531–574.
Huang, Yan. 1992. Against Chomsky’s typology of empty categories. Journal of Pragmatics 17(1). 1–29.
Koeneman, Olaf. 2006. Deriving the difference between full and partial pro-drop. In Peter Ackema, Patrick Brandt, Maaike Schoorlemmer & Fred Weerman (eds.), Arguments and agreement, 76–100. Oxford: Oxford University Press.
König, Ekkehard & Volker Gast. 2006. Focused assertion of identity: A typology of intensifiers. Linguistic Typology 101, 223–276.
Li, Charles N. & Sandra A. Thompson. 1974. An explanation of word order change SVO → SOV. Foundations of Language 12(2), 201–214.
. 1979. Third-person pronouns and zero-anaphora in Chinese discourse. In Talmy Givón (ed.), Discourse and syntax [Syntax and Semantics 12], 311–336. New York, NY: Academic Press.
Li, Xiaoshi. 2012. Variation of subject pronominal expression in Mandarin Chinese. Sociolinguistic Studies 6(1). 91–119.
Li, Xiaoshi & Robert Bayley. 2018. Lexical frequency and syntactic variation. Subject pronoun use in Mandarin Chinese. Asia-Pacific Language Variation 4(2). 135–160.
Liu, Chi-Ming Louis. 2014. A modular theory of radical pro drop. Cambridge, MA: Harvard University PhD dissertation.
Mahalingappa, Laura J. 2013. The acquisition of split-ergative case marking in Kurmanji Kurdish. In Edith L. Bavin & Sabine Stoll (eds.), Acquisition of Ergativity, 239–270. Amsterdam: John Benjamins.
Milborrow, Stephen. 2019. rpart.plot: Plot ’rpart’ models: An enhanced version of ’plot.rpart’. R package version 3.0.7, [URL] (last access 19 July 2023).
Mosel, Ulrike. 2019. Multi-CAST Teop annotation notes. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
Mosel, Ulrike & Stefan Schnell. 2015. Multi-CAST Teop. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
Mosel, Ulrike & Yvonne Thiesen. 2007. The Teop sketch grammar. Unpublished manuscript. Kiel: University of Kiel ([URL], last access 19 July 2023).
Nagaya, Naonori. 2006. Preferred referential expressions in Tagalog. Tokyo University Linguistics Papers 251, 83–106.
Neeleman, Ad & Kriszta Szendröi. 2005. Pro drop and pronouns. In John Alderete, Chung-hye Han & Alexei Kochetov (eds.), Proceedings of the 24th West Coast Conference on Formal Linguistics, 299–307. Somerville, MA: Cascadilla Proceedings Project.
Neuwirth, Erich. 2014. RColorBrewer: ColorBrewer palettes. R package version 1.1–2, [URL] (last access 19 July 2023).
Ooms, Jeroen. 2019. curl: A modern and flexible web client for R. R package version 3.3, [URL] (last access 19 July 2023).
Pavlou, Natalia. 2018. Morphosyntactic dependencies and verb movement in Cypriot Greek. Chicago, IL: University of Chicago dissertation.
Perlmutter, David. 1971. Deep and surface structure constraints in syntax. New York. NY: Holt, Rinehart and Winston.
Pu, Ming-Ming. 1995. Anaphoric patterning in English and Mandarin narrative production. Discourse Processes 19(2). 279–300.
. 1997. Zero anaphora and grammatical relations in Mandarin. In Talmy Givón (ed.), Grammatical relations: A Functionalist perspective [Typological Studies in Language 35], 281–322. Amsterdam: John Benjamins.
R Core Team. 2019. R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing, [URL] (last access 19 July 2023).
Radatz, Hans-Ingo. 2008. Non-lexical core-arguments in Basque, Romance and German: How (and why) Spanish syntax is shifting towards sentential head-marking and morphological cross-reference. In Richard Waltereit & Ulrich Detges (eds.), The paradox of grammatical change: Perspectives from romance, 181–215, Amsterdam: John Benjamins.
Roberts, Ian & Anders Holmberg. 2010. Introduction: Parameters in minimalist theory. In Theresa Biberauer, Anders Holmberg, Ian Roberts & Michelle Sheehan (eds.), Parametric variation: Null subjects in minimalist theory, 1–57. Cambridge: Cambridge University Press.
RStudio Team. 2018. RStudio: Integrated development environment for R. Boston, MA: RStudio, Inc., [URL] (last access 19 July 2023).
Schiborr, Nils N. 2015. Multi-CAST English. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL] (last access 26 July 2023).
2016. Multi-CAST corpus overview and description. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
2017. Antecedent distance and the accessibility hierarchy: A quantitative approach. Bamberg: University of Bamberg MA thesis.
2019. Multi-CAST collection overview. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
Schiborr, Nils N., Stefan Schnell & Hanna Thiele. 2018. RefIND – Referent indexing in natural-language discourse: Annotation guidelines v1.1. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
. 2015. Multi-CAST Vera’a. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
. 2019. Multi-CAST Vera’a annotation notes. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
Schnell, Stefan & Danielle Barth. 2018. Discourse motivations for pronominal and zero objects across registers in Vera’a. Language Variation and Change 30(1). 51–81.
Sessarego, Sandro & Javier Gutiérrez-Rexach. 2017. Revisiting the null subject parameter: New insights from Afro-Peruvian Spanish. Isogloss 3(1). 43–68.
Speas, Margaret. 2006. Economy, agreement, and the representation of null arguments. In Peter Ackema Patrick Brandt, Maaike Schoorlemmer & Fred Weerman (eds.), Arguments and agreement, 35–75. Oxford: Oxford University Press.
Sun, Chao-Fen & Talmy Givón. 1985. On the so-called SOV word order in Mandarin Chinese: A quantified text study and its implications. Language 61(2), 329–351.
Therneau, Terry & Beth Atkinson. 2019. rpart: Recursive partitioning and regression trees. R package version 4.1–15, [URL] (last access 19 July 2023).
Vollmer, Maria. 2019a. Is there a breakdown of ergativity in Northern Kurdish, and what factors influence it? Talk presented at the 4th International Conference on Kurdish Linguistics, Rouen, September 2–3, 2019.
. 2019b. How radical is pro-drop in Mandarin? A quantitative corpus study on referential choice in Mandarin Chinese. Bamberg: University of Bamberg MA thesis.
. 2020a. Multi-CAST Mandarin. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
. 2020b. Multi-CAST Mandarin annotation notes. In Geoffrey Haig & Stefan Schnell (eds.), Multi-CAST: Multilingual corpus of annotated spoken texts ([URL]) (last access 19 July 2023).
Wickham, Hadley. 2016. ggplot2: Elegant graphics for data analysis. New York: Springer ([URL]) (last access 19 July 2023).
Wratil, Melani. 2011. Uncovered pro: On the development and identification of null subjects. In Melani Wratil & Peter Gallmann (eds.), Null pronouns [Studies in Generative Grammar 106], 99–140. Berlin, Boston: Mouton de Gruyter.
Cited by (1)
Cited by one other publication
Egurtzegi, Aitor, Damián E. Blasi, Sebastian Sauppe, Balthasar Bickel & Stefan Schnell
This list is based on CrossRef data as of 2 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
