Article published In: Chinese Language and Discourse
Vol. 6:2 (2015) ► pp.218–244
Corpus-based Chinese studies
A historical review from the 1920s to the present
Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 28 January 2016
https://doi.org/10.1075/cld.6.2.06xu
https://doi.org/10.1075/cld.6.2.06xu
This article reviews corpus-based Chinese studies, both applied and theoretical, from the 1920s to the present. It will be shown that, while corpus-based Chinese studies have been gaining momentum for only the last couple of decades, the roots of Chinese corpus linguistics go all the way back to the beginning of the 20th century. Today the bulk of corpus-based Chinese studies is oriented toward applied linguistics, with the compilation of frequency character/word lists and interlanguage Chinese studies being the most popular types of research. In addition to applied linguistic studies, this overview also highlights some innovative corpus studies on lexical and grammatical aspects of both classical and modern Chinese, as well as studies of sociolinguistic variation and discourse pragmatics. Overall, important groundwork in Chinese corpus linguistics is acknowledged and future directions are discussed.
References (78)
Ao, Hongde. 1929a. “Yutiwen Yingyong Zihui Yanjiu Baogao: Chen Heqin Shi Yutiwen Yingyong Zihui zhi Xu [A study of characters used in vernacular Chinese: Extending Chen’s character list].” Jiaoyu Zazhi [Journal of Education] 21 (2): 77–101.
. 1929b. “Yutiwen Yingyong Zihui Yanjiu Baogao (Xu): Chen Heqin Shi Yutiwen Yingyong Zihui zhi Xu [A Study of Characters Used in Vernacular Chinese: Extending Chen’s Character List (Continued)].” Jiaoyu Zazhi [Journal of Education] 21 (3): 97–113.
Bei, Guiqin,Xuetao Zhang and . 1988. Hanzi Pindu Tongji [Frequency calculation of Chinese characters]. Beijing: Publishing House of Electronics Industry.
Bybee, Joan, and Paul Hopper (eds). 2001. Frequency and the Emergence of Linguistic Structure. Amsterdam: John Benjamins Publishing Company.
Chen, Heqin. 1922. “Yutiwen Yingyong Zihui [Characters used in vernacular Chinese].” Xin Jiaoyu [New Education] 5 (5): 987–995.
. 1928. Yutiwen Yingyong Zihui [Characters used in vernacular Chinese]. Shanghai: The Commercial Press.
. 2008. “Yutiwen Yingyong Zihui [Characters used in vernacular Chinese].” In Chen Heqin Quanji (Di Liu Juan) [The complete works of Heqin Chen (Volume 6)], ed. by Xiuyun Chen and Yifei Chen, 55–114. Nanjing: Jiangsu Education Press.
Chen, Liang, and Jiansheng Guo. 2010. “From Language Structures to Language Use: A Case from Mandarin Motion Expression Classification.” Chinese Language and Discourse 1 (1): 31–65.
China State Language Commission and China State Bureau of Standards. 1992. Xiandai Hanyu Zipin Tongji Biao [A frequency list of modern Chinese characters]. Beijing: Language and Culture Press.
Chu, Chengzhi, and Xiaohe Chen. 1993. “Jianli Hanyu Zhongjieyu Yuliaoku Xitong de Jiben Shexiang [The initial considerations of creating a Chinese interlanguage corpus system].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 7 (3): 199–205.
Cui, Xiliang. 2005. “Oumei Xuesheng Hanyu Jieci Xide de Tedian ji Pianwu Fenxi [The acquisition of Chinese prepositions by European and American learners and analysis of their errors].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 19 (3): 83–95.
Cui, Xiliang, and Baolin Zhang (eds.). 2013. Dier Jie Hanyu Zhongjieyu Yuliaoku Jianshe yu Yingyong Guoji Xueshu Taolunhui Lunwen Xuanji [Proceedings of the second international symposium on the construction and application of Chinese interlanguage corpora]. Beijing: Beijing Language and Culture University Press.
Feng, Zhiwei. 2006. “Evolution and Present Situation of Corpus Research in China.” International Journal of Corpus Linguistics 11 (2): 173–207.
. 2012. Ziran Yuyan Chuli Jianming Jiaocheng [A concise course of natural language processing]. Shanghai: Shang Foreign Language Education Press.
Granger, Sylviane. 1996. “From CA to CIA and Back: An Integrated Approach to Computerized Bilingual and Learner Corpora.” In Languages in Contrast: Text-based cross-linguistic studies, ed. by Karin Aijmer, et al., 37–51. Lund: Lund University Press.
. 2002. “A Bird’s-eye View of Learner Corpus Research.” In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, ed. by Sylviane Granger, et al., 3–33. Amsterdam: John Benjamins Publishing Company.
Gu, Yueguo. 2009. “From Real-life Situated Discourse to Video-Stream Data-mining.” International Journal of Corpus Linguistics 14 (4): 433–466.
Hai, Liuwen. 2011. Shisan Jing Zipin Yanjiu [The frequency study of the thirteen Chinese canons]. Beijing: Higher Education Press.
Halliday, Michael. 1959. The Language of the Chinese “Secret History of the Mongols”. Oxford: Basil Blackwell.
. 1992. “Language as System and Language as Instance: The Corpus as a Theoretical Construct.” In
Directions in Corpus Linguistics: Proceedings of Nobel symposium 82
, ed. by Jan Svartvik, 61–77. Berlin: Mouton de Gruyter.
Hung, William. 1932. Yinde Shuo [On indexing]. Peking: Harvard-Yenching Institute Sinological Index Series, Peking University Library.
Institute of Language Teaching Research at Beijing Language Institute. 1985a. Hanyu Cihui de Tongji yu Fenxi [The statistics and analysis of Chinese words]. Beijing: Foreign Language Teaching and Research Press.
. 1985b. Changyong Zi he Changyong Ci [Frequently used characters and words]. Beijing: The Publishing House of Beijing Language Institute.
. 1988. Xiandai Hanyu Pinlu Cidian [Frequency dictionary of Chinese words]. Beijing: The Publishing House of Beijing Language Institute.
Lau, Din Cheuk, Ho Che Wah, and Chen Fong Ching (eds.). 1992. A Concordance to Shuoyuan No. 1 (ICS Ancient Chinese Texts Concordance Series). Hong Kong: The Commercial Press.
Li, Fanglan. 2011. Xiandai Hanyu Yuyiyun de Lilun Tansuo yu Xide Yanjiu: Yuliaoku Yuyanxue Shijiao [A theoretical exploration into semantic prosody and its acquisition of modern Chinese: A corpus linguistics perspective]. Unpublished PhD thesis. Minzu University of China.
Li, Jinman, and Fuyun Wu. 2013. “Leixingxue Gaikuo yu Eryu Xuexizhe Hanyu Guanxi Congju Chanchu Yanjiu [Typological generalisations and the study on the production of Chinese relative clauses by second language learners].” Waiyu Jiaoxue yu Yanjiu [Foreign language teaching and research] 45 (1): 80–92.
Li, Jinxi. 1922. “Guoyu zhong Jiben Yuci de Tongji Yanjiu [Statistical considerations of basic ocabulary in Chinese].” Guowen Xuehui Congkan [Journal of Chinese language society] 1 (1): 81–84.
Liu, Yuan, Nanyuan Liang, Dejin Wang, Sheying Zhang, Tieying Yang, Chunyu Jie, and Wei Sun. 1990. Xiandai Hanyu Changyong Ci Cipin Cidian [A dictionary of frequency of modern Chinese words]. Beijing: Astronautic Publishing House.
Liu, Yun. 2009. “Hanyu Cihui Tongji Yanjiu Shuping [A review of Chinese vocabulary statistical studies].” Hanyu Xuexi [Chinese Language Learning] 30 (1): 62–69.
Liu, Zhiji. 2009. “Zipin Shijiao de Gu Wenzi Sishu Fenbu Fazhan Yanjiu [Research on the distribution and development of four categories of character construction in ancient writings from the isual angle of character frequency].” Gu Hanyu Yanjiu [Research in ancient Chinese Language] 22 (4): 2–11.
Louw, Bill. 1993. “Irony in the Text or Insincerity in the Writer? The Diagnostic Potential of Semantic Prosodies.” In Text and Technology: In honour of John Sinclair, ed. by Mona Baker, Gill Francis, and Elena Tognini-Bonelli, 157–176. Amsterdam: John Benjamins Publishing Company.
Lu, Wu, Fuyin Nan, and Shan Chen (eds.). 2000. Yuanchao Mishi Jiaozhu [Collated and annotated secrect history of the Mongols]. Jinan: Qilu Publishing House.
Luke, Kang-kwong, and Theodossia-Soula Pavlidou (eds.). 2002. Telephone Calls: Unity and Diversity in the Structure of Telephone Conversations across Languages and Cultures. Amsterdam: John Benjamins Publishing Company.
McCarthy, John, and Alan Prince. 1995. “Prosodic Morphology.” In Handbook of Phonology, ed. by John Goldsmith, 318–366. Oxford: Blackwell.
McEnery, Tony, and Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.
Pan, Shuguang. 1984. Guji Suoyin Gailun [Indexing of Chinese classics: A general introduction]. Beijing: Catalogs and Documentations Publishing House.
Sentence Pattern Research Group at Beijing Language Institute. 1989a. “Xiandai Hanyu Jiben Juxing [Basic sentence patterns of modern Chinese].” Shijie Hanyu Jiaoxue [Chinese teaching in the world] 3 (1): 26–35.
. 1989b. “Xiandai Hanyu Jiben Juxing (Xuyi) [Basic sentence patterns of modern Chinese (Continued I)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 3 (3): 144–148.
. 1989c. “Xiandai Hanyu Jiben Juxing (Xuer) [Basic sentence patterns of modern Chinese (Continued II)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 3 (4): 211–219.
. 1990. “Xiandai Hanyu Jiben Juxing (Xusan) [Basic sentence patterns of modern Chinese (Continued III)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 4 (1): 27–33.
. 1991. “Xiandai Hanyu Jiben Juxing (Xusi) [Basic sentence patterns of modern Chinese (Continued IV)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 5 (1): 23–29.
Siewierska, Anna, Jiajin Xu, and Richard Xiao. 2010. “
Bang-le Yi Ge Da Mang (Offered a Big Helping Hand): A Corpus Study of the Splittable Compounds in Spoken and Written Chinese.” Language Sciences 32 (4): 464–487.
Tao, Hongyin. 1996. Units in Mandarin Conversation: Prosody, Discourse, and Grammar. Amsterdam: John Benjamins Publishing Company.
. 2000. “Cong ‘Chi’ Kan Dongci Lunyuan Jiegou de Dongtai Tezheng [‘Eating’ and emergent argument structure].” Yuyan Yanjiu [Language research] 20 (3): 21–38.
Tao, Zhixing, and Jingnong Zhu. 1923. Pingmin Qianzi Ke [Early Chinese lessons for illiterates]. Shanghai: The Commercial Press.
Teubert, Wolfgang. 2005. “My Version of Corpus Linguistics.” International Journal of Corpus Linguistics 10 (1): 1–13.
Thompson, Sandra, and Hongyin Tao. 2010. “Conversation, Grammar, and Fixedness: Adjectives in Mandarin Revisited.” Chinese Language and Discourse 1 (1): 3–30.
Thorndike, Edward. 1921. The Teacher’s Word Book. New York City: Teachers College, Columbia University.
Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: John Benjamins Publishing Company.
Tsai, Ting Kan. 1922. Laojielao [The interpretation of Dao De Jing based on Dao De Jing texts]. Beijing: Self-publication. A synthetic study of LaoTzu’s TaoTeChing in Chinese
Tsou, Benjamin, and Rujie You. 2007. ‘21 Shiji Huayu Xin Ciyu Cidian’ Bianzhu Ganyan [Reflections on compiling ‘The Dictionary of Chinese Neologisms for the 21st Century’
]. Cishu Yanjiu [Lexicographical Studies] 29 (6): 123–128.
. 2010. Quanqiu Huayu Xin Ciyu Cidian [An international dictionary of Chinese neologisms]. Beijing: The Commercial Press.
Tsou, Benjamin, Hing-Lung Lin, Terence Chan, Jerome Hu, Ching-hai Chew, and John K.P. Tse. 1997. “A Synchronous Chinese Language Corpus from Different Speech Communities: Construction and Application.” International Journal of Computational Lingusitics and Chinese Language Processing 2 (1): 91–104.
Unihan Digital Technology Co., Ltd. 2008. Guji Hanzi Zipin Tongji [Character frequency calculation of classical Chinese]. Beijing: The Commercial Press.
Wang, Chunxia. 2001. Jiyu Yuliaoku de Lihe Ci Yanjiu [
A corpus-based study of splittable sompounds
]. M.A. dissertation, Beijing Language and Culture University.
Wang, Fengyang. 1983. Ci de Pinlu he Zi de Fenhua [Word frequency and character differentiation]. Paper presented at the
Second Annual Conference of Chinese Linguistics Society
. Hefei, Anhui, May 1983.
Wang, Haifeng. 2011. Xiandai Hanyu Liheci Lixi Xingshi Gongneng Yanjiu [A functional study of the split forms of splittable compounds in Modern Chinese]. Beijing: Peking University Press.
Xiao, Richard, and Tony McEnery. 2004. Aspect in Mandarin Chinese: A Corpus-based Study. Amsterdam: John Benjamins Publishing Company.
Xiao, Richard, Paul Rayson, and Tony McEnery. 2009. A Frequency Dictionary of Mandarin Chinese: Core Vocabulary for Learners. London: Routledge.
Xiao, Xiqiang, and Wangxi Zhang (eds.). 2011. Shoujie Hanyu Zhongjieyu Yuliaoku Jianshe yu Yingyong Guoji Xueshu Taolunhui Lunwen Xuanji [Proceedings of the first international symposium on the construction and application of Chinese interlanguage corpora]. Beijing: World Publishing Corporation.
Xiong, Wenxin. 1996. “Liuxuesheng Ba Zi Jiegou de Biaoxian Fenxi [An Analysis of the Performance of Ba Constructions by International Students].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 10 (1): 80–87.
Xu, Jiajin. 2009. Qingshaonian Hanyu Kouyu zhong Huayu Biaoji de Huayu Gongneng Yanjiu [The use of discourse markers in spoken Chinese of urban teenagers]. Beijing: Foreign Language Teaching and Research Press.
Yang, Shiqiao. 2011. Jiyu Yuliaoku de Hanyu Yihuan Huihua Xiuzheng Yanjiu [A corpus based study of repair in Chinese doctor–patient conversations]. Unpublished PhD thesis. Shanghai: Shanghai International Studies University.
Zhang, Pu. 1999a. “Guanyu Daguimo Zhenshi Wenben Yuliaoku de Jidian Lilun Sikao [Some theoretical thoughts about the large-scale corpora of authentic texts].” Yuyan Wenzi Yingyong [Applied Linguistics] 8, 1, 34–43.
. 1999b. “Guanyu Yugan yu Liutongdu de Sikao [On Language sense and degree of circulation].” Yuyan Jiaoxue yu Yanjiu [Language Teaching and Linguistic Studies] 21 (2): 83–96.
Zhou, Shengya. 2007. Soushenji Yuyan Yanjiu [A linguistic study of Soushenji]. Beijing: China Renmin University Press.
Zipf, George. 1935. The Psycho-Biology of Language: An Introduction to Dynamic Philology. Boston: Houghton Mifflin Company.
Cited by (14)
Cited by 14 other publications
易, 娜
Yu, Guodong, Yaxin Wu, Paul Drew & Chase Wesley Raymond
2024. The DIG Mandarin Conversations (DMC) Corpus. Chinese Language and Discourse. An International and Interdisciplinary Journal 15:1 ► pp. 105 ff.
Zhang, Haiwei, Peng Sun, Yaowaluk Bianglae & Winda Widiawati
Zhang, Huiyu, Yayu Shi & Haitao Liu
Zhang, Huiyu, Hailing Zhang, Yayu Shi & Yueyu Chen
2024. Features of translation policies on the Chinese mainland (1979–2021). Target. International Journal of Translation Studies 36:2 ► pp. 276 ff.
Man Kit Lee, Stephen, Hey Wing Liu & Shelley Xiuli Tong
Zhang, Huiyu & Yayu Shi
Hsu, Chan-Chia & Shu-Kai Hsieh
Wu, Shuqiong
Jiang, Shang, Xin Jiang & Anna Siyanova-Chanturia
Chen, Howard Ho-Jan & Hongyin Tao
Hsu, Chan-Chia
This list is based on CrossRef data as of 8 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
