Article published In: The Grammar of Canonical and Non-canonical Wh-constructions
Edited by C.-T. James Huang
[Concentric 51:2] 2025
► pp. 303–348
Climbing towards the NLU of the universal reading of shei ‘who’
Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 6 November 2025
https://doi.org/10.1075/consl.24041.chu
https://doi.org/10.1075/consl.24041.chu
Abstract
The wh-expressions shei ‘who’ and shenme ‘what’ in Mandarin
Chinese not only convey an interrogative meaning but also exhibit existential and universal readings in specific contexts (Huang, Cheng-Teh James. 1982. Logical Relations in Chinese and the Theory of Grammar. Doctoral Dissertation, Massachusetts Institute of Technology, Cambridge, MA., Cheng, Lisa Lai-Shen. 1991. On the Typology of Wh-questions. Doctoral Dissertation, Massachusetts Institute of Technology, Cambridge, MA., . 1995. On dou quantification. Journal of East Asian Linguistics 4.31:197–234. , Li, Yen-Hui Audrey. 1992. Indefinite wh in Mandarin Chinese. Journal of East Asian Linguistics 1.21:125–155. , Tsai, Wei-Tien Dylan. 1994. On Economizing the Theory of A-Bar Dependencies. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA., Lin, Jo-Wang. 1996. Polarity licensing and wh-phrase quantification in Chinese. Doctoral dissertation, University of Massachusetts at Amherst, Amherst, MA., . 1998. On existential polarity wh-phrases in Chinese. Journal of East Asian Linguistics 7.31:219–255. ). Focusing
on the universal interpretation of shei, this paper has three objectives. First, we demonstrate that current
state-of-the-art large language models (LLMs) such as ChatGPT, lack reliability in distinguishing these three distinct readings of
shei. Second, we develop a specialized natural language processing and understanding (NLP/NLU) system capable
of processing and interpreting shei across diverse contexts with greater accuracy, transparency, and consistency.
Unlike current LLMs, our system is built upon Wang et al.’s (Wang, Wen-jet, Chen, Chia-jung, Lee, Chia-ming, Lai, Chien-yu, and Lin, Hsin-hung. 2019a. Articut: Chinese Word Segmentation and POS Tagging System [Computer program]. Retrieved February, 1, 2024, from [URL], . 2019b. Linguistics-Oriented Keyword Interface NLU System [Computer program]. Retrieved February, 1, 2024, from [URL]) generative linguistics-based NLP/NLU software tools, Articut and Loki, enabling it
to require significantly less training data to interpret the universal reading of shei. Third, we compare our
model’s performance with that of ChatGPT, demonstrating its superior accuracy and robustness in interpreting the universal
reading of shei.
Article outline
- 1.Introduction
- 2.A concise review on the universal interpretation of shei and shenme
- 3.Proposal
- 3.1Articut and Loki
- 3.2The procedure of building a language model for interpreting shei ‘who’
- 3.2.1Collecting, preprocessing and annotating the data
- 3.2.2CWS/POS tagging via Articut and sentence pattern matching via Loki
- 3.2.3Deploying, training, and adjusting the model
- 3.3The algorithm of the language model
- 4.A comparison between our model and ChatGPT-4o
- 4.1Testing our model
- 4.2Testing ChatGPT-4o
- 4.3Results
- 4.4Discussion
- 5.Limitations of our model
- 6.Concluding remarks
- Acknowledgements
- Notes
- List of abbreviations
References
References (86)
Atil, Berk, Alexa Chittams, Liseng Fu, Ferhan Ture, Lixinyu Xu, and Breck Baldwin. 2024. LLM Stability: A Detailed Analysis with some Surprises. Retrieved November, 1, 2024, from [URL]
Attali, Yigal, and Maya Bar-Hillel. 2003. Guess where: The position of correct answers in multiple-choice test items as a psychometric variable. Journal of Educational Measurement 40.21:109–128.
Bender, Emily M. 2013. Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax. San Rafael, CA: Morgan & Claypool Publishers.
Bender, Emily M., and Alexander Koller. 2020. Climbing towards NLU: On meaning, form, and understanding in the age of data. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ed. by Dan Jurafsky, Joyce Chai, Natalie Schluter and Joel Tetreault, 5185–5198. Seattle, WA: Association for Computational Linguistics.
Berent, Iris, and Gary Marcus. 2019. No integration without structured representations: Response to Pater. Language 95.11:75–86.
Berwick, Robert C., Noam Chomsky, and Massimo Piattelli-Palmarini. 2013. Poverty of the stimulus stands: Why recent challenges fail. Rich Languages from Poor Inputs, ed. by Massimo Piatteli-Palmarini and Robert C. Berwick, 19–42. New York & Oxford: Oxford University Press.
Blair-Stanek, Andrew, and Benjamin van Durme. 2025. LLMs Provide Unstable Answers to Legal Questions. Retrieved November, 1, 2024, from [URL]
Burch, Robert. 2001. Charles Sanders Peirce. Stanford Encyclopedia of Philosophy, ed. by Edward Zalta and Uri Nodelman. Retrieved November, 1, 2024, from [URL]
Chen, Haifeng. 2012. Lun feizhenxing xunwen “shei” tezhi yiwenju [On non-veridical interrogatives with “who” specificity]. Qiqihaer Daxue Xuebao [Journal of Qiqihar University] 61:86–87.
Chen, Lei, Bobo Li, Li Zheng, Haining Wang, Zixiang Meng, Runfeng Shi, Hao Fei, Jun Zhou, Fei Li, Chong Teng, and Donghong Ji. 2024. What factors influence LLMs’ Judgments? A case study on question answering. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), ed. by Nicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti and Nianwen Xue, 17473–17485. Torino, Italy: European Language Resources Association (ELRA) and International Committee on Computational Linguistics (ICCL).
Cheng, Chieh-Chih. 2014. A Developmental Study on the Non-Interrogative Interpretations of Mandarin Wh-words. MA thesis, National Tsing Hua University, Hsinchu.
Cheng, Lisa Lai-Shen. 1991. On the Typology of Wh-questions. Doctoral Dissertation, Massachusetts Institute of Technology, Cambridge, MA.
Cheng, Lisa Lai-Shen, and Cheng-Teh James Huang. 1996. Two types of donkey sentences. Natural Language Semantics 4.21:121–163.
. 2020. Revisiting donkey anaphora in Mandarin Chinese: A reply to Pan and Jiang (2015). International Journal of Chinese Linguistics 7.21:167–186.
Chomsky, Noam. 1970. Remarks on Nominalization. Readings in English Transformational Grammar, ed. by Roderick Jacobs and Peter Rosenbaum, 184–221. Waltham, MA: Ginn & Co.
. 1973. Conditions on Transformations. A Festschrift for Morris Halle, ed. by Stephen-R. Anderson and Paul Kiparsky, 232–86. New York: Holt, Rinehart and Winston.
Cui, Songren, and Kuo-Ming Sung. 2022. Negations and questions. A Reference Grammar for Teaching Chinese: Syntax and Discourse, ed. by Songren Cui and Kuo-Ming Sung, 71–115. Singapore: Springer Publishing.
Dentella, Vittoria, Fritz Günther, and Evelina Leivada. 2023. Systematic testing of three Language Models reveals low language accuracy, absence of response stability, and a yes-response bias. Proceedings of the National Academy of Sciences (PNAS), vol. 120.511, ed. by May Berenbaum, article number: e2309583120. Washington, D.C.: National Academy of Sciences (NAS).
Diebold, Francis X. 2012. On the origin(s) and development of the term “Big Data.” PIER Working Paper, ed. by Penn Institute for Economic Research, article number: 12–037. Philadelphia, PA: University of Pennsylvania.
Douven, Igor. 2017. Peirce on abduction. Stanford Encyclopedia of Philosophy, ed. by Edward Zalta and Uri Nodelman. Retrieved November, 1, 2024, from [URL]
Everaert, Martin B., Marinus Antonius Christianus Huybregts, Noam Chomsky, Robert C. Berwick, and Johan J. Bolhuis. 2015. Structures, not strings: Linguistics as part of the cognitive sciences. Trends in Cognitive Sciences 19.121:729–743.
Fodor, Jerry. A., and Zenon W. Pylyshyn. 1988. Connectionism and cognitive architecture: A critical analysis. Cognition 28.1–21:3–71.
Gao, Wencheng, and Xiaofeng Zhang. 2021. A study of negative polarity items in Chinese existential sentences. Linguistics and Literature Studies 9.11:12–21.
Gundersen, Odd Erik, and Sigbjørn Kjensmo. 2018. State of the art: reproducibility in artificial intelligence. Proceedings of the 32nd AAAI Conference on Artificial Intelligence and 30th Innovative Applications of Artificial Intelligence Conference and 8th AAAI Symposium on Educational Advances in Artificial Intelligence, ed. by Sheila Mcllraith and Kilian Weinberger, 1644–1651. Washington, D.C.: AAAI Press.
Hendrycks, Dan, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2020. Measuring Massive Multitask Language Understanding. Retrieved November, 1st, 2024, from [URL]
Huang, Cheng-Teh James. 1982. Logical Relations in Chinese and the Theory of Grammar. Doctoral Dissertation, Massachusetts Institute of Technology, Cambridge, MA.
Huang, Rui-Heng Ray. 2012. On two types of existential subjects in Chinese A-not-A questions. Language and Linguistics 13.61:1171–1210.
Huang, Haiquan, Zhou Peng, and Stephen Crain. 2018. Wh-Questions, universal statements and free choice inferences in child Mandarin. Journal of Psycholinguistic Research 47.61:1391–1409.
Kambhampati, Subbarao. 2024. Can large language models reason and plan? Annals of New York Academy of Sciences 1534.11:15–18.
Lake, Brendon, and Marco Baroni. 2018. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. Proceedings of the 35th International Conference on Machine Learning, ed. by Jennifer Dy and Andreas Krause, 4487–4499. Stockholm, Sweden: International Machine Learning Society (IMLS).
Lee, Hun-Tak Thomas. 1986. Studies on Quantification in Chinese. Doctoral Dissertation, University of California, Los Angeles.
Leivada, Evelina, Elliot Murphy, and Gary Marcus. 2023. DALL-E 2 fails to reliably capture common syntactic processes. Social Sciences and Humanities Open. 8.11:1–10.
Leivada, Evelina, Gary Marcus, Fritz Günther, and Elliot Murphy. 2024a. A Sentence is Worth a Thousand Pictures: Can Large Language Models Understand Hum4n L4ngu4ge and the W0rld behind W0rds? Retrieved November, 1, 2024, from [URL]
Leivada, Evelina, Dentella, Vittoria, and Günther, Fritz. 2024b. Evaluating the language abilities of Large Language Models vs. humans: Three caveats. Biolinguistics 181:1–12.
Li, Yen-Hui Audrey. 1992. Indefinite wh in Mandarin Chinese. Journal of East Asian Linguistics 1.21:125–155.
Lin, Jo-Wang. 1996. Polarity licensing and wh-phrase quantification in Chinese. Doctoral dissertation, University of Massachusetts at Amherst, Amherst, MA.
. 1998. On existential polarity wh-phrases in Chinese. Journal of East Asian Linguistics 7.31:219–255.
. 2004. Choice functions and scope of existential polarity wh-phrases in Mandarin Chinese. Linguistics and Philosophy 27.41:451–491.
. 2014. Wh-expressions in Mandarin Chinese. The Handbook of Chinese Linguistics, ed. by Cheng-Teh James Huang, Yen-Hui Audrey Li and Andrew Simpson, 180–207. Hoboken, NJ: John Wiley & Sons.
Linzen, Tal. 2019. What can linguistics and deep learning contribute to each other? Response to Pater. Language 95.11:99–108.
Linzen, Tal, and Marco Baroni. 2021. Syntactic structure from deep learning. Annual Review of Linguistics 7.11:195–212.
Liu, Mingming. 2019. Unifying universal and existential wh’s in Mandarin. Proceedings of the 29th Semantics and Linguistic Theory Conference (SALT-29), ed. by Katherine Blake, Forrest Davis, Kaelyn Lamp and Joseph Rhyne, 258–278. Los Angeles, CA: University of California.
Lu, Sin-En, Bo-Han Lu, Chao-Yi Lu, and Richard Tzong-Han Tsai. 2022. Exploring methods for building dialects-Mandarin code-mixing corpora: A case study in Taiwanese Hokkien. Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, ed. by Yoav Goldberg, Zornitsa Kozareva and Yue Zhang, 6287–6305. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics.
Marcus, Gary F. 2001. The Algebraic Mind: Integrating Connectionism and Cognitive Science. Cambridge, MA: MIT Press.
2018. Deep Learning: A Critical Appraisal. Retrieved November, 1, 2024, from [URL]
Marcus, Gary F., Brinkmann Ursula, Clahsen Harald, Wiese Richard, and Pinker Steven. 1995. German inflection: The exception that proves the rule. Cognitive Psychology 29.31:189–256.
Marcus, Gary F., and Ernest Davis. 2019. Rebooting AI: Building Artificial Intelligence we can Trust. New York & London: Vintage Books.
Mirzadeh, Iman, Keivan Alizadeh, Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, and Mehrdad Farajtabar. 2024. GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. . Retrieved November, 1, 2024 from [URL]
Murphy, Elliot, and Evelina Leivada. 2022. A model for learning strings is not a model of language. Proceedings of the National Academy of Sciences (PNAS), vol 119.231, ed. by May Berenbaum, article number: e2201651119. Washington, D.C.: National Academy of Sciences (NAS).
OpenAI. 2023. GPT-4 Technical Report. Retrieved November, 1, 2024, from [URL]
. 2024. GPT-4. Retrieved November, 1, 2024, from [URL]
Pater, Joe. 2019. Generative linguistics and neural networks at 60: Foundation, friction, and fusion. Language 95.11:41–74.
Pinker, Steven, and Alan Prince. 1988. On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition 28.1–21:73–193.
Qin, Tian, Naomi Saphra, and David Alvarez-Melis. 2024. Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization. Retrieved November, 1, 2024, from [URL]
Renze, Matthew, and Erhan Guven. 2024. The effect of sampling temperature on problem solving in large language models. Proceedings of Findings of the association for computational linguistics: EMNLP 2024, ed. by Yaser Al-Onaizan, Mohit Bansal and Yun-Nung Chen, 7346–7356. Miami, FL: Association for Computational Linguistics.
Saffran, Jenny R., Richard N. Aslin, and Elissa L. Newport. 1996. Statistical learning by 8-month-old infants. Science 274.52941:1926–1928.
Shan, Wei. 2010. Tezhiwen biao fouding yongfa yanjiu [The study on the negative usage of wh-questions]. Jiamusi Daxue Shehuikexue Xuebao [Journal of Social Science of Jiamusi University] 28.51:63–65.
Shi, Hongli. 2021. Hanyu feiyiwen yongfa yiwenci de jufa weizhi ji yuyi fenxi [The syntactic positions and semantics of non-interrogative wh-items in Mandarin Chinese]. Zhongguo Yuwen Tongxun [Current Research in Chinese Linguistics] 100.11:41–53.
Stowell, Tim. 1981. Origins of Phrase Structure. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA.
Su, Yi Esther, Yu Jin, Guo-Bin Wan, Ji-Shui Zhang, and Lin-Yan Su. 2014. Interpretation of wh-words in Mandarin-speaking high-functioning children with autism spectrum disorders. Research in Autism Spectrum Disorders 8.101:1364–1372.
Tang, Ke. 2011. Zhiren Yiwenci Gongxian Xianxiang de Renzhi Yanjiu [A Cognitive Research on the Co-occurrence of Interrogatives Denoting Persons]. MA thesis, Hunan Normal University, Hunan, China.
Trinh, Trieu H., and Minh-Thang Luong. 2024. AlphaGeometry: An Olympiad-level AI system for Geometry. London: Google DeepMind.
Tsai, Wei-Tien Dylan. 1994. On Economizing the Theory of A-Bar Dependencies. Doctoral dissertation, Massachusetts Institute of Technology, Cambridge, MA.
. 2001. On subject specificity and theory of syntax-semantics interface. Journal of East Asian Linguistics 10.21:129–168.
Wang, Peiyi, Lei Li, Liang Chen, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, and Zhifang Sui. 2023. Large language models are not fair evaluators. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, ed. by Lun-Wei Ku, Andre Martins and Vivek Srikumar, 9440–9450. Bangkok, Thailand. Association for Computational Linguistics (ACL).
Wang, Wen-jet, Chen, Chia-jung, Lee, Chia-ming, Lai, Chien-yu, and Lin, Hsin-hung. 2019a. Articut: Chinese Word Segmentation and POS Tagging System [Computer program]. Retrieved February, 1, 2024, from [URL]
. 2019b. Linguistics-Oriented Keyword Interface NLU System [Computer program]. Retrieved February, 1, 2024, from [URL]
Wang, Yaxue, and Zhaoting Li. 2013. Guoyu yiwenci de feiyiwen yongfa ertong xide yanjiu — yi “shenme” han “shei” wei li [A study on child acquisition of non-interrogative use of Chinese wh-words “shenme” and “shei”]. Shaoguan Xueyuan Xuebao [Journal of Shaoguan College] 34.91:132–138.
Wei, Sheng-Lun, Cheng-Kuang Wu, Hen-Hsen Huang, and Hsin-Hsi Chen. 2024. Unveiling selection biases: Exploring order and token sensitivity in Large Language Models. Proceedings of the Findings of the Association for Computational Linguistics: ACL 2024, ed. by Lun-Wei Ku, Andre Martins and Vivek Srikumar, 5598–5621. Bangkok, Thailand. Association for Computational Linguistics (ACL).
Wu, Tianyu, Shizhu He, Jingping Liu, Siqi Sun, Kang Liu, Qing-Long Han, and Tang Yang. 2023. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE/CAA Journal of Automatica Sinica 10.51:1122–1136.
Xie, Zhiguo. 2007. Nonveridicality and existential polarity wh-phrases in Mandarin. Proceedings from the 43rd Annual Meeting of the Chicago Linguistic Society, ed. by Malcolm Elliott, James Kirby, Osamu Sawada, Eleni Staraki and Suwon Yoon, 121–135. Chicago, IL: Chicago Linguistic Society.
Yang, Chung-Yu Barry. 2024. Revisiting sentence-final adjunct WHAT. Language and Linguistics 25.11:162–186.
Yang, Charles. 2004. Universal Grammar, Statistics, or both. Trends in Cognitive Science 8.101:451–456.
Yang, Yang, Leticia Pablos, and Lisa Lai-Shen Cheng. 2023. The processing mechanisms of Mandarin wh-questions. Journal of Chinese Linguistics 51.11:147–171.
Zhang, Junge. 2006. “Shei,” “nage(ren),” “shenmeren” zhi yitong [The similarities and differences of “who”, “which person”, and “what person”]. Fuyang Shifan Xueyuan Xuebao [Journal of Fuyang Normal University] 61:69–72.
Zheng, Lianmin, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, and Eric Xing. 2023. Judging LLM-as-a-judge with MT-bench and chatbot arena. Retrieved November, 1, 2024, from [URL]
