Article published In: Chinese as a Second Language (漢語教學研究—美國中文教師學會學報)
Vol. 58:3 (2023) ► pp.262–298
A machine learning-based investigation of syntactic complexity measures in second language acquisition of Chinese
Published online: 6 September 2024
https://doi.org/10.1075/csl.22007.shi
https://doi.org/10.1075/csl.22007.shi
Abstract
L2 syntactic complexity research has shown that employing multidimensional measures to assess writing proficiency levels may yield more accurate results compared to one-dimensional measures. However, the optimal number and combination of measures have not been thoroughly explored. This study addresses these gaps by employing a novel machine learning-based approach to quantitatively investigate the effectiveness of multidimensional measures and determine the optimal combination of measures. Through the analysis of a dataset comprising 36 L2 Chinese learners, we found that multidimensional syntactic complexity measures outperformed one-dimensional measures in accurately differentiating learners. Specifically, a combination of three measures (Mean Length of Sentence, Mean Length of T-unit, and Mean Length of Clause) achieved the highest accuracy in classifying the learners. The implications of this research include the development of computer-assisted placement tests and proficiency evaluations, as well as providing practical guidance for language teachers in designing complexity-based instruction.
Keywords: syntactic complexity, machine learning, multidimensionality
摘要
二语句法复杂度研究表明,采用多维度测量方法评估学习者的写作水平可能比单一维度的测量方法更为准确。然而,最佳的测量指标和组合方式尚未得到充分探究。本研究通过采用一种新颖的基于机器学习的计算方法,定量研究多维度句法复杂度测量方法的有效性,并确定最佳的指标组合。通过对36名中文二语学习者的作文数据进行分析,我们发现多维度语法复杂性度量方法在准确区分学习者方面表现优于单一维度测量方法。具体而言,句子平均长度、T-单元平均长度和从句平均长度这三个指标的组合在对学习者进行分类时达到了最高的准确度。本研究的意义包括开发计算机辅助下的汉语水平测试和评估,并可以为语言教师设计针对复杂度的教学提供了实际的指导建议。
Article outline
- Introduction
- Literature review
- Syntactic complexity measures in L2 research
- Overall complexity index — Mean Length of T-unit (MLT)
- Clausal complexity via subordination — Mean number of clauses per T-unit (C/T)
- Clausal complexity via coordination — Mean number of T-unit per sentence (T/S)
- Phrasal complexity index — Mean Length of Clause (MLC)
- Syntactic complexity measures in L2 Chinese research
- Examining syntactic complexity multidimensionally
- Syntactic complexity measures in L2 research
- Research questions
- Methods
- Participants and setting
- Data collection and coding
- Complexity measures used in this study
- Machine learning models
- Results
- Descriptive statistics
- One- and two-measure binary models
- Multinomial models using two or more measures
- Discussion
- Implications and pedagogical significance
- Limitations
References
References (24)
ACTFL. (2012). ACTFL Proficiency Guidelines 2012. ACTFL. [URL]
Aydin, S. (2009). Test anxiety among foreign language learners: a review of literature. Journal of Language and Linguistic Studies, 5(1), 127–137.
Bardovi-Harlig, K. (1992). A second look at t-unit analysis: Reconsidering the sentence. TESOL Quarterly, 26(2), 390–395.
Chen, M. (2015). Hanyu zuowei di’er yuyan ziran kouyu chanchu de fuzadu, zhunquedu he liulidu yanjiu 汉语作为第二语言自然口语产出的复杂度, 准确度和流利度研究[A complexity, accuracy and fluency study on natural oral production of Chinese as a second language]. Yuyan jiaoxue yu yanjiu 语言教学与研究 [Language Teaching and Linguistic Studies], 31, 1–10.
Chen, M., & Li, Y. (2016). Hanyu muyuzhe Hanyu kouyu fuzadu yanjiu 汉语母语者汉语口语复杂度研究 [The complexity of Chinese oral speeches by Korean native speakers]. Yuyan Wenzi Yingyong 语言文字应用 [Applied Linguistics], 41, 61–70.
Halliday, M. A. K., & Matthiessen, C. (2006). Construing experience through meaning : A language-based approach to cognition. Continuum.
Han, X., & Feng, L. (2017). Hanyu kouyu jufa fuzadu fazhan zeping zhong jizhunxing zhibiao de yingyong fangfa yanjiu 汉语口语句法复杂度发展测评中基准型指标的应用方法研究 [The application method research of benchmark measures in Chinese oral syntactic complexity development assessment]. Shijie Hanyu jiaoxue 世界汉语教学 [Chinese Teaching in the World], 31(4), 542–559.
Hu, R., Wu, J., & Lu, X. (2022). Word-combination-based measures of phraseological diversity, sophistication, and complexity and their relationship to second language Chinese proficiency and writing quality. Language Learning, 72(4), 1128–1169.
Hunt, K. W. (1965). Grammatical structures written at three grade levels. National Council of Teachers of English. [URL]
(1977). Early blooming and late blooming syntactic structures. In C. R. Coper & L. Odell (Eds.), Evaluating Writing: Describing, Measuring, Judging. National Council of Teachers of English.
Jagaiah, T., Olinghouse, N. G., & Kearns, D. M. (2020). Syntactic complexity measures: variation by genre, grade-level, students’ writing abilities, and writing quality. Reading and Writing, 33(10), 2577–2638.
Jiang, W. (2012). Measurements of development in L2 written production: the case of L2 Chinese. Applied Linguistics, 34(1), 1–24.
Jin, H. (2007). Syntactic maturity in second language writings: a case of Chinese as a foreign language (CFL). Journal of Chinese Language Teachers Association, 42(1), 27–52.
Larsen-Freeman, D., & Strom, V. (1977). The construction of a second language acquisition index of development. Language Learning, 27(1), 123–134.
Lu, X., & Wu, J. (2022). Noun-phrase complexity measures in Chinese and their relationship to L2 Chinese writing quality: a comparison with topic–comment-unit-based measures. The Modern Language Journal, 106(1), 267–283.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning. The MIT Press.
Norris, J. M., & Ortega, L. (2009). Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics, 30(4), 555–578.
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24(4), 492–518.
Perkins, K. (1980). Using objective methods of attained writing proficiency to discriminate among holistic evaluations. TESOL Quarterly, 14(1), 61.
Varoquaux, G., Buitinck, L., Louppe, G., Grisel, O., Pedregosa, F., & Mueller, A. (2015). Scikit-learn. GetMobile: Mobile Computing and Communications, 19(1), 29–33.
Witte, S. P., & Davis, A. J. (1980). The stability of t-unit length: a preliminary investigation. Research in the Teaching of English, 14(1), 5–17.
