In:Learner Corpora and Language Teaching
Edited by Sandra Götz and Joybrato Mukherjee
[Studies in Corpus Linguistics 92] 2019
► pp. 101–128
Researching learner language through POS keyword and syntactic complexity analyses
Published online: 6 May 2019
https://doi.org/10.1075/scl.92.06per
https://doi.org/10.1075/scl.92.06per
Abstract
In this paper, we explore the affordances of two different research methods that may be instrumental in analysing learner language complexity: standard corpus linguistics methodology and automatic syntactic complexity analysers. Our results suggest that POS keyword analysis and automatic syntactic analysis are both effective for the identification of linguistic features at different levels of development in instructed SLA. In particular, countable nouns, prepositional phrases, verbs and general adverbs are criterial features that define the transition from lower to higher secondary school language learning in the Spanish component of the ICCI corpus. We suggest that the analysis of complexity in noun phrases is of great interest for researchers and teachers in terms of identifying development milestones in language acquisition.
Article outline
- 1.Introduction
- 2.Research methodology
- 2.1Data
- 2.2Research methods
- 3.Contrasting learner corpora (1): POS keyword analysis
- 3.1Grades 7 and 8
- 3.2Grades 11 and 12
- 3.3Grades 7 and 8 vs Grades 11 and 12
- 4.Contrasting learner corpora (2): Automatic syntactic complexity analysis
- 4.1Grades 7, 8, 11 and 12: Complexity in the noun phrase
- 4.2Grades 7, 8, 11 and 12: Syntactic sophistication
- 4.2.1Traditional measures of syntactic complexity
- 4.2.2Measures of syntactic sophistication
- 4.3Grades 7 and 8 vs Grades 11 and 12: Complexity in the noun phrase and syntactic sophistication measures
- 5.Discussion and pedagogical implications
- 5.1RQ (1) Do different groups of learners present distinct linguistic features? Can these features be identified by means of automatic analysis of language?
- 5.2RQ (2) Do different methods to carry out automatic analysis of language present a similar picture of complexity and language development? How do the research methods in this paper complement each other? How does this complementarity inform language teaching?
- 6.Conclusion and some limitations
Notes References Appendix
References (44)
Aguado-Jiménez, Pilar, Pérez-Paredes, Pascual & Sánchez, Purificación. 2012. Exploring the use of multidimensional analysis of learner language to promote register awareness. System 40(1): 90–103.
Alexopoulou, Theodora, Michel, Marije Cornelie, Murakami, Akira & Meurers, Detmar. 2017. Task effects on linguistic complexity and accuracy: A large-scale learner corpus analysis employing natural language processing techniques. Language Learning 67(S1): 180–208.
Biber, Douglas, Gray, Bethany & Poonpon, Kornwipa. 2011. Should we use characteristics of conversation to measure grammatical complexity in L2 writing development? TESOL Quarterly 45(1): 5–35.
Boulton, Alex. 2009. Testing the limits of data-driven learning: Language proficiency and training. ReCALL 21(1): 37–54.
Bulté, Bram & Housen, Alex. 2012. Defining and operationalising L2 complexity, Alex Housen, Folkert Kuiken & Ineke Vedder (eds), Dimensions of L2 Performance and Proficiency. Complexity, Accuracy and Fluency in SLA [Language Learning & Language Teaching 32], 21–46. Amsterdam: John Benjamins.
. 2014. Conceptualizing and measuring short-term changes in L2 writing complexity. Journal of Second Language Writing 26: 42–65.
Byrnes, Heidi & Sinicrope, Castle. 2008. Advancedness and the development of relativization in L2 German: A curriculum-based longitudinal study, Lourdes Ortega & Heidi Byrnes (eds), The Longitudinal Study of Advanced L2 Capacities, 109–138. New York NY: Routledge.
Carlsen, Cecilie. 2012. Proficiency level – a fuzzy variable in computer learner corpora. Applied Linguistics 33(2): 161–183.
Chen, Danqi & Manning, Christopher. 2014. A Fast and Accurate Dependency Parser using Neural Networks, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 740–750. Doha, Qatar: Association for Computational Linguistics.
Díez-Bedmar, María Belén. 2010a. Analysis of the Written Expression in English in the University Entrance Examination at the University of Jaén. PhD dissertation, Universidad de Jaén.
. 2010b. From secondary school to university: The Use of the English article system by Spanish learners. In Exploring Corpus Linguistics in English Language Teaching, Begoña Belles-Fortuno, Mari Carmen Campoy & Lluisa Gea-Valor (eds), 45–55. Castelló: Publicacions de la Universitat Jaume I.
. 2012. The use of the common European framework of reference for languages to evaluate compositions in the English exam section of the university admission examination. Revista de Educación 357: 55–79.
. 2015. Article use and criterial features in Spanish EFL writing: A pilot study from CEFR A2 to B2 levels. In Learner Corpora in Language Testing and Assessment [Studies in Corpus Linguistics 70], Marcus Callies & Sandra Götz (eds), 163–190. Amsterdam: John Benjamins.
Díez-Bedmar, María Belén & Papp, Szilvia. 2008. The use of the English article system by Chinese and Spanish learners. In Linking up Contrastive and Learner Corpus Research, Gaëtanelle Gilquin, Szilvia Papp & María Belén Díez-Bedmar (eds), 147–175. Amsterdam: Rodopi.
Díez-Bedmar, María Belén & Pérez Paredes, Pascual. 2012. A cross-sectional analysis of the use of the English articles in Spanish learner writing. In Developmental and Crosslinguistic Perspectives in Learner Corpus Research [Tokyo University of Foreign Studies 4], Yukio Tono, Yuji Kawaguchi & Makoto Minegishi (eds), 139–157. Amsterdam: John Benjamins.
Ellis, Nick C. O’Donnell, Matthew Brook & Römer, Ute. 2013. Usage-based language: Investigating the latent structures that underpin acquisition. Language Learning 63(s1): 25–51.
Ellis, Nick C., Römer, Ute & O’Donnell, Matthew Brook. 2016. Usage-based approaches to language acquisition and processing: cognitive and corpus investigations of construction grammar. Malden, MA: Wiley.
Foster, Pauline & Tavakoli, Parvaneh. 2009. Native speakers and task performance: Comparing effects on complexity, fluency, and lexical diversity. Language Learning 59(4): 866–896.
Gablasova, Dana, Brezina, Vaclav & McEnery, Tony. 2017. Exploring learner language through corpora: Comparing and interpreting corpus frequency information. Language Learning 67(S1):130–154.
Granger, Sylviane, Dagneaux, Estelle, Meunier, Fanny & Paquot, Magali. 2009. The International Corpus of Learner English, Version 2. Handbook and CD-ROM. Louvain-la-Neuve: Presses Universitaires de Louvain.
Ionin, Tania & Díez-Bedmar, María Belén. Forthcoming. Article use in Russian and Spanish learner writing at CEFR B1 and B2 levels: effects of proficiency, native language, and specificity, Bert. S. W. Le Brun & Magali Paquot (eds). Learner Corpora and Second Language Acquisition. Cambridge: CUP.
Kyle, Kris. 2016. Measuring syntactic development in L2 writing: Fine Grained Indices of Syntactic Complexity and Usage-based Indices of Syntactic Sophistication. PhD Dissertation, Georgia State University. <[URL]> (24 March 2017).
Lu, Xiaofei. 2010. Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics 15(4): 474–496.
. 2011. A corpus-based evaluation of syntactic complexity measures as indices of college level ESL writers’ language development. TESOL Quarterly 45(1): 36–62.
Norris, John M. & Ortega, Lourdes. 2009. Towards an organic approach to investigating CAF in instructed SLA: The case of complexity. Applied Linguistics 30(4): 555–578.
Ortega, Lourdes. 2003. Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics 24(4): 492–518.
Pendar, Nick & Chapelle, Carol A. A. 2008. Investigating the promise of learner corpora: Methodological issues. CALICO Journal 25: 189–206.
Pérez-Paredes, Pascual. 2004. Learner oral corpora and network-based language teaching. In How to Use Corpora in Language Teaching [Studies in Corpus Linguistics 12], John M. Sinclair (ed.), 249–269. Amsterdam: John Benjamins.
Pérez Paredes, Pascual & Díez-Bedmar, María Belén. 2012. Intensifying adverbs in learner writing. In Developmental and Crosslinguistic Perspectives in Learner Corpus Research [Tokyo University of Foreign Studies 4], Yukio Tono, Yuji Kawaguchi & Makoto Minegishi (eds), 105–123. Amsterdam: John Benjamins.
Pérez-Paredes, Pascual & Sánchez-Tornel, María. 2014. Adverb use and language proficiency in young learners’ writing. International Journal of Corpus Linguistics 192: 178–200.
Pérez-Paredes, Pascual, Guillamón, Carlos & Aguado, Pilar. 2018. Language teachers’ perceptions on the use of OER language processing technologies in MALL. Computer Assisted Language Learning.
Rayson, Paul. 2008. From key words to key semantic domains. International Journal of Corpus Linguistics 13(4): 519–549.
. 2009. Wmatrix: A Web-based Corpus-processing Environment. Computing Department, Lancaster University. <[URL]> (1 February 2016).
Robinson, Peter, Mackey, Alison, Gass, Susan & Schmidt, Richard. 2012. Attention and awareness in second language acquisition. In The Routledge Handbook of Second Language Acquisition, Susan Gass & Alison Mackey (eds), 247–267. New York NY: Routledge.
Schmidt, Richard. 1990. The role of consciousness in second language learning. Applied Linguistics 11: 129–158.
Tomasello, Michael. 2003. Constructing a Language: A Usage-based Approach to Child Language Acquisition. Cambridge MA: Harvard University Press.
Tono, Yukio & Díez-Bedmar, María Belén. 2014. Focus on learner writing at the beginning and intermediate stages: The ICCI corpus. International Journal of Corpus Linguistics 19(2): 163–177.
van Rijn, Jacolien, van Rijn, Hedderik & Hendriks, Petra. 2012. How WM load influences pronoun interpretation. In Proceedings of the 11th International Conference on Cognitive Modeling, Nele Rußwinkel, Uwe Drewitz & Hedderick van Rijn (eds), 101–102. Berlin: Universitaetsverlag der TU Berlin.
van Rooy, Bertus & Schäfer, Lade. 2002. The effect of leavener errors on pos tag errors during automatic POS tagging. Southern African Linguistic and Applied Language Studies, 20(4), 325–335.
Verspoor, Marjolijn, Lowie, Wander & Van Dijk, Marijn. 2008. Variability in second language development from a dynamic systems perspective. The Modern Language Journal 92(2): 214–231.
Vyatkina, Nina. 2012. The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study. The Modern Language Journal 96(4): 576–598.
Cited by (4)
Cited by four other publications
Lim, Joyce Dong Ok, Geraldine Mark, Pascual Pérez-Paredes & Anne O’Keeffe
Picoral, Adriana, Shelley Staples & Randi Reppen
2021. Automated annotation of learner English. International Journal of Learner Corpus Research 7:1 ► pp. 17 ff.
Blanco-Suárez, Zeltia, Francisco Gallardo-del-Puerto & Evelyn Gandón-Chapela
[no author supplied]
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
