Article published In: Chinese as a Second Language (漢語教學研究—美國中文教師學會學報)
Vol. 57:3 (2022) ► pp.238–269
Building a corpus of spoken Chinese interlanguage and some results of preliminary analyses
Published online: 27 February 2023
https://doi.org/10.1075/csl.18015.du
https://doi.org/10.1075/csl.18015.du
Abstract
The corpus of spoken Chinese interlanguage in this study consists of over one million characters of transcribed
student speech from data collected from nearly ten years of study abroad research. The main research method was a comparison with
a similar corpus of spoken Chinese by native speakers. Preliminary analyses show that 11 of the top 20 most frequent words in both
the learner corpus and native corpus are the same. Learners used some grammatical function words, such as 把 (bǎ), 了 (le), 它 (tā), and 着 (zhe) less than native speakers, while other ones, such as
我 (wǒ) and 的 (de), much more frequently. Possible explanations for these patterns, as well as pedagogical
implications and directions for further research are discussed.
摘要
本文描述了以本文作者历时近十年收集的美国大学生在中国留学的口语语料基础上建立的百万余汉字的中介语口语语料库,并报告了初步的分析研究结果。与一个类似的母语口语语料库比较的结果显示,在中介语语料库和母语语料库中使用最频繁的20个词中,有11个
是相同的。学习者使用的 “把, 了, 它, 着” 等语法功能词比母语者少得多,但使用的 “我”、“的” 等词比母语者多得多。本文分析了出现这些现象的可能原因,并提出了教学方面的探讨及对未来研究方向的展望。
关键词: 汉语口语中介语语料库,海外留学,语法功能词,语用能力
Article outline
- Introduction
- Literature review
- Corpus Linguistics in second language acquisition and teaching research
- Spoken corpora of L2 Chinese
- Developing a corpus of spoken Chinese interlanguage
- Chinese learning background of the students
- Program in China
- Data collection
- Stage I
- Stage II
- Stage III
- Building the corpus
- Analyzing data in corpus linguistics research
- Preliminary analyses
- Theoretical background: The Contrastive Interlanguage Analysis (CIA)
- Research Questions
- Method
- Native reference corpus
- Word segmentation
- Research tool
- Results
- Wordlists and some initial observations
- Keywords
- Comparisons of 把,了, and 着
- 我
- 的
- 它
- 很 vs. 挺
- 是…的
- Discussion
- 我
- 的
- 它
- 很 vs. 挺
- Pedagogical implications
- Helping students address the underuse issue
- Helping students develop pragmatic competence
- Limitations and further research
- Conclusions
- Acknowledgements
- Notes
References
References (62)
Adolphs, S., & Knight, D. (2012). Building a spoken corpus: What are the basics? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (38–52). New York: Routledge.
Aijmer, K. (2002). Modality in advanced Swedish learners’ written interlanguage. In Granger, S., Hung, J. & Petch-Tyson, S. (Eds.), Computer learner corpora, second language acquisition, and foreign language teaching (55–76). Amsterdam and Philadelphia: John Benjamins.
Anthony, L. (2014). AntConc (Version 3.4.3) [Computer Software]. Tokyo, Japan: Waseda University. Available from [URL]
Ayoun, D. (1996). The subset principle in second language acquisition. Applied Psycholinguistics, 171, 185–213.
Biq, Y-O. (1990). The Chinese third-person pronoun in spoken discourse. Papers from the 26th regional meeting of the Chicago Linguistic Society, 11, 61–72.
Bourgerie, D. S. (1996). Acquisition of modal particles in Chinese second language learners. In S. McGinnis (Ed.), Chinese pedagogy: An emerging field. Columbus, OH: The Ohio State University Foreign Language Publications.
Breyer, Y. (2011). Corpora in language teaching and learning: Potential, evaluation, challenges. New York: Peter Lang.
Brezina, V., Gablasova, D., & McEnery, T. (2019). Corpus-based approaches to spoken L2 production: Evidence from the Trinity Lancaster Corpus. International journal of Learner Corpus Research, 5(2), 119–125.
Chao, Y-R. (1968). A grammar of spoken Chinese. Berkeley and Los Angeles: University of California Press.
Chomsky, N. (1986). Knowledge of language: Its nature, origin, and use. New York: Praeger Publishers.
Diao, W. (2016). Peer socialization into gendered L2 Mandarin practices in a study abroad context: Talk in the dorm. Applied Linguistics, 37(5), 599–620.
Du, H. (2013). The development of Chinese fluency during study abroad in China. The Modern Language Journal, 971, 131–143.
(2015). American college students studying abroad in China: Language, identity, and self-presentation. Foreign Language Annals, 48(2), 250–266.
(2016). A corpus linguistics approach to the research and teaching of Chinese as a second language: The case of the ba-construction. In H. Tao (Ed.), Integrating Chinese linguistic research and language teaching and learning (13–31). Amsterdam: John Benjamins.
Evison, J. (2012). What are the basics of analyzing a corpus? In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (122–135). New York: Routledge.
Gablasova, D., Brezina, V., & McEnery, T. (2019). The Trinity Lancaster Corpus: Development, description and application. International Journal of Learner Corpus Research, 5(2), 126–158.
Gilquin, G. (2019). Light verb constructions in spoken L2 English: An exploratory cross-sectional study. International Journal of Learner Corpus Research, 5(2), 181–206.
González-Lloret, M. (2019). Technology and L2 pragmatics learning. Annual Review of Applied Linguistics, 391, 113–127.
(2015). Contrastive interlanguage analysis: A reappraisal. International Journal of Learner Corpus Research, 1(1), 7–24.
Granger, S., Gilquin, G., & Meunier, F. (Eds.). (2015). The Cambridge handbook of learner corpus research. Cambridge: Cambridge University Press.
Granger, S.; Hung, J., & Petch-Tyson, S. (2002). Computer learner corpora, second language acquisition, and foreign language teaching. Amsterdam and Philadelphia: John Benjamins.
Huang, C., & Xue, N. (2015). Modeling word concepts without conversation: Linguistic and computational issues in Chinese word identification. In, W. S-Y. Wang & C. Sun (Eds.), The Oxford handbook of Chinese linguistics (348–361). Oxford: Oxford University Press.
Institute of Language Education, Beijing Language and Culture University. (1986). Xiandai Hanyu pinlü cidian [A frequency dictionary of Modern Standard Chinese]. Beijing: Beijing Language and Culture University Press.
Koester, A. (2012). Building small specialized corpora. In A. O’Keeffe & M. McCarthy (Eds.), The Routledge handbook of corpus linguistics (66–79). New York: Routledge.
Leech, G. (1998). Preface. In Granger, S. (Ed.), Learner English on computer (xiv–xx). London and New York: Longman.
Li, C., & Thompson, S. (1981). Mandarin Chinese: A functional reference grammar. Berkeley and Los Angeles: University of California Press.
Li, W. (2006). 把话题链纳入汉语教学语法体系–汉语语篇特点在外语教学中的体现 [Incorporating topic chains into pedagogical grammar of Chinese]. Journal of Chinese Language Teachers Association, 41(1), 31–56.
Li, X. (2010). Sociolinguistic variation in the speech of learners of Chinese as a second language. Language Learning, 60(2), 366–408.
(2017). Stylistic variation in L1 and L2 Chinese: Native speakers, learners, teachers, and textbooks. Chinese as a Second Language, 52(1), 55–76.
Liu, Y. 刘月华, Pan, W. 潘文娱, & Gu, W. 故韡. (2006). Shiyong xiandai Hanyu yufa 实用现代汉语语法 [Practical Grammar of Modern Chinese]. Beijing: Commercial Press.
Liu, Y., Yao, T., Bi, N-P., Ge, L., & Shi, Y. (2016). Integrated Chinese 中文聽說讀寫: Traditional character textbook, Volume 11 (4th Ed.). Boston: Cheng & Tsui Company, Inc.
(2017). Integrated Chinese 中文聽說讀寫: Traditional character textbook, Volume 21 (4th Ed.). Boston: Cheng & Tsui Company, Inc.
Lorenz, G. (1999). Adjective intensification – Learners vs. native speakers. A corpus study of argumentative writing. Amsterdam and Atlanta: Rodopi.
McEnery, T., Brezina, V., Gablasova, D., & Banerjee, J. (2019). Corpus linguistics, learner corpora, and SLA: Employing technology to analyze language use. Annual Review of Applied Linguistics, 391, 74–92.
Ming, T., & Tao, H. (2008). Developing a Chinese heritage language corpus: Issues and a preliminary report. In A. He & Y. Xiao (Eds.), Chinese as a heritage language: Fostering rooted world citizenry (167–87). Honolulu, HI: National Foreign Language Resource Center, University of Hawai’i.
O’Keeffe, A., & McCarthy, M. (Eds.). (2012). The Routledge handbook of corpus linguistics. New York: Routledge.
Paquot, M., & Plonsky, L. (2017). Quantitative research methods and study quality in learner corpus research. International Journal of Learner Corpus Research, 3(1), 61–94.
Polio, C. (1995). Acquiring nothing? The use of zero pronouns by nonnative speakers of Chinese and the implications for the acquisition of nominal reference. Studies in Second Language Acquisition, 171, 353–377.
Römer, U. (2011). Corpus research applications in second language teaching. Annual Review of Applied Linguistics, 311, 205–225.
Schmidt, R. (1993). Consciousness, learning and interlanguage pragmatics. In G. Kasper & S. Blum-Kulka (Eds.), Interlanguage pragmatics (43–57). New York: Oxford University Press.
(2001). Attention. In P. Robinson (Ed.), Cognition and second language instruction (3–32). Cambridge: Cambridge University Press.
Seidlhofer, B. (2001). Closing a conceptual gap: The case for a description of English as a lingua franca. International Journal of Applied Linguistics, 111, 133–158.
Starr, R. L. (2011). Variation in affective sentence-final particle use and transcription on Taiwanese Mandarin TV dramas. Paper presented at Symposium about Language and Society (SALSA) XIX. Austin, Texas.
Sun, C. (2015). The use of De as a noun phrase marker. In, W. S.-Y. Wang & C. Sun (Eds.), The Oxford handbook of Chinese linguistics (362–392). Oxford: Oxford University Press.
Taguchi, N. (2015). Instructed pragmatics at a glance: Where instructional studies were, are, and should be going. Language Teaching, 48(1), 1–50.
Tao, H. (2000). Adverbs of absolute time and assertiveness in vernacular Chinese: A corpus-based study. Journal of the Chinese Language Teachers Association, 35(2), 53–74.
(2005). The Gap between natural speech and spoken Chinese teaching material: Toward a discourse approach to pedagogy. Journal of the Chinese Language Teachers Association, 40(2), 1–24.
(2015a). Profiling the Mandarin spoken vocabulary based on corpora. In W. S.-Y. Wang & C. Sun (Eds.), The Oxford handbook of Chinese linguistics (336–347). Oxford: Oxford University Press.
(2015b). Teaching students to be discourse pragmatists: Practices in an L2 Chinese linguistics class. CHUN- Chinesischunterricht [Chun: Chinese Language Teaching], 301, 30–51.
Tsao, F. (1979). A functional study of topic in Chinese: The first step towards discourse analysis. Taipei: Student Book.
Wexler, K., & Manzini, M. R. (1987). Parameters and learnability in binding theory. In T. Roeper & E. Williams (Eds.), Parameter setting (166–179). Dordrecht: D. Reidel.
Wu, R.-J. (2004). Stance in talk: A conversation analysis of Mandarin final particles. Amsterdam: John Benjamins.
Xiao, R., Rayson, P., & McEnery, T. (2009). A frequency dictionary of Mandarin Chinese: Core vocabulary for learners. New York: Taylor and Francis.
Yeung, L. (2009). Use and misuse of “besides”: A corpus study comparing native speakers’ and learners’ English, System, 37(2), 330–342.
Zhang, B., 张宝林等. (2014). Ji yu yuliaoku de waiguoren Hanyu jushi xide yanjiu 基于语料库的外国人汉语句式习得研究 [A corpus-based study on the acquisition of Chinese sentence patterns by foreigners]. Beijing: Zhongguo Shuji Chubanshe.
