Article published In: Cumulative Knowledge Building in Learner Corpus Research
Edited by Tove Larsson and Douglas Biber
[International Journal of Learner Corpus Research 11:1] 2025
► pp. 17–46
Vocabulary sophistication in children’s L2 school writing
Published online: 25 October 2024
https://doi.org/10.1075/ijlcr.23025.dur
https://doi.org/10.1075/ijlcr.23025.dur
Abstract
This paper tests three hypotheses about written vocabulary in child L2 English. Specifically, as children mature,
(1) the mean frequency values of the nouns they use increase; (2) the mean frequencies of other parts-of-speech decrease; (3) the
use of academic vocabulary increases only in certain types of writing. Using a corpus of writing by children in Norway, hypothesis
1 was confirmed up to the mid-teenage years. The mean frequency values of nouns then decreased. Analysis showed that the early
increase is due to decreased repetition of low-frequency topic words. After age 15, frequencies drop as the main source of
vocabulary moves from a region around the 150th most frequent lemma to one around the 550th. Hypotheses 2 and 3 were partially
confirmed. Mean frequencies of non-nouns decreased in non-stories after Year 9. Non-stories became more academic across school
years. Stories had much lower scores overall but also showed an increase at Year 10.
Article outline
- 1.Introduction
- 1.1Lexical sophistication
- 1.2Lexical sophistication and word frequency: POS as a mediating variable
- 1.3Lexical sophistication and academic vocabulary: The mediating role of text type
- 1.4Hypotheses
- 2.Methods
- 2.1Corpus
- 2.2Measures of vocabulary frequency and academic vocabulary in learner texts
- 2.3Procedure
- 3.Testing the hypotheses
- 3.1Hypothesis 1: Mean frequency values for noun lemma tokens increase across year groups
- 3.2Hypothesis 2: Mean frequency values for verb, adjective and adverb lemma tokens decrease across year groups
- 3.3Hypothesis 3: Use of academic vocabulary increases across year groups in academic writing, but not in non-academic writing
- 3.4Exploring the hypotheses further
- 3.4.1Year 8 vs. Year 10 non-stories
- 3.4.2Year 11 vs. Year 10 & Year 8 non-stories
- 4.Discussion and conclusions
- Notes
References
References (56)
Banerjee, J., Franceschina, F., & Smith, A. M. (2007). Documenting
features of written language production typical at different IELTS band score levels. IELTS
Research
Reports, 71, 1–69.
Bulté, B., Housen, A., Pierrard, M., & Van Daele, S. (2008). Investigating
lexical proficiency development over time — the case of Dutch-speaking learners of French in
Brussels. Journal of French Language
Studies, 181, 277–298.
Crossley, S. A. (2020). Linguistic
features in writing quality and development: An overview. Journal of Writing
Research, 11(3), 415–443.
Crossley, S. A., & McNamara, D. (2012). Predicting
second language writing proficiency: The roles of cohesion and linguistic
sophistication. Journal of Research in
Reading, 35(2), 115–135.
Crossley, S. A., Salsbury, T., & McNamara, D. (2011). Predicting
the proficiency level of language learners using lexical indices. Language
Testing, 29(2), 243–263.
Daller, H., Turlik, J., & Weir, I. (2013). Vocabulary
acquisition and the learning curve. In S. Jarvis & H. Daller (Eds.), Vocabulary
knowledge: Human ratings and automated
measures (pp. 185–215). John Benjamins.
Davies, M. (2008–). The
Corpus of Contemporary American: 450 million words, 1990-present. Retrieved
from [URL] on 1 May
2018.
(2018). Word
frequency data. Retrieved from [URL] on November
2012.
Dirdal, H., Hasund, I. K., Drange, E.-M. D., Vold, E. T., & Berg, E. M. (2022). Design
and construction of the Tracking Written Learner Language (TRAWL) Corpus: A longitudinal and multilingual young learner
corpus. Nordic Journal of Language Teaching and
Learning, 10(2), 115–135.
Douglas, S. R. (2015). The
relationship between lexical frequency profiling measures and rater judgements of spoken and written general English language
proficiency on the CELPIP-General Test. TESL Canada
Journal 32 (Special Issue
9), 43–64.
Durrant, P., & Brenchley, M. (2019). Development
of vocabulary sophistication across genres in English children’s writing. Reading and
Writing, 32(8), 1927–1953.
Durrant, P., Brenchley, M., & McCallum, L. (2021). Understanding
development and proficiency in writing: quantitative corpus linguistic approaches. Cambridge University Press.
Durrant, P., & Durrant, A. (2022). Appropriateness
as an aspect of lexical richness: what do quantitative measures tell us about children’s
writing? Assessing
Writing, 511.
Eguchi, M., & Kyle, K. (2023). L2
collocation profiles and their relationship with vocabulary proficiency: A learner corpus
approach. Journal of Second Language
Writing, 601.
Gardner, D., & Davies, M. (2014). A
new academic vocabulary list. Applied
Linguistics, 35(3), 305–327.
Gregori-Signes, C., & Clavel-Arroiti, B. (2015). Analysing
lexical density and lexical diversity in university students’ written discourse. Procedia
Social and Behavioral
Sciences, 1981, 546–556.
Guo, L., Crossley, S. A., & McNamara, D. (2013). Predicting
human judgments of essay quality in both integrated and independent second language writing samples: A comparison
study. Assessing
Writing, 181, 218–238.
Hammou, B. A., Larouz, M., & Fagroud, M. (2021). Word
frequency, Range and Lexical diversity: Picking out Changes in Lexical Proficiency among University Learners in an EFL
Context. International Journal of Linguistics and Translation
Studies, 2(2), 22–38.
Hasund, I. K. (2022). Genres
in young learner L2 English writing: A genre typology for the TRAWL (Tracking Written Learner Language)
corpus. Nordic Journal of Language Teaching and
Learning, 10(2), 242–271.
Horst, M., & Collins, L. (2006). From
Faible to Strong: How Does their Vocabulary Grow? The Canadian Modern
Language
Review, 63(1), 83–106.
Juanggo, W. (2018). Investigating
lexical diversity and lexical sophistication of productive vocabulary in the written discourse of Indonesian EFL
learners. Indonesian Journal of Applied
Linguistics, 8(1), 38–48.
Kim, M., & Crossley, S. A. (2018). Modelling
second language writing quality: A structural equation investigation of lexical, syntactic, and cohesive features in
source-based and independent writing. Assessing
Writing, 371, 39–56.
Kolsvik, S. G. (2019). Moving
toward(s) Americanization: A study of the use of and attitudes toward American spelling, vocabulary and pronunciation among
Norwegian students and teachers (Unpublished Master’s
thesis). University of Louvain and University of Oslo.
Kyle, K. (2020). Measuring
Lexical Richness. In S. Webb (Ed.), The
Routledge Handbook of Vocabulary
Studies (pp. 454–476). Routledge.
(2021). Lexis. In N. Tracy-Ventura & M. Paquot (Eds.), The
Routledge Handbook of Second Language Acquisition and
Corpora (pp. 332–344). Routledge.
Kyle, K., & Crossley, S. A. (2016). The
relationship between lexical sophistication and independent and source-based writing. Journal
of Second Language
Writing, 341, 12–24.
Larsson, T., Sardinha, T. B., Gray, B., & Biber, D. (2023). Exploring
early L2 writing development through the lens of grammatical complexity. Applied Corpus
Linguitics, 31.
Laufer, B. (1994). The
lexical profile of second language writing: does it change over time? RELC
Journal, 25(2), 21–33.
(1998). The
development of passive and active vocabulary in a second language: Same or different? Applied
Linguistics, 19(2), 255–271.
Laufer, B., & Nation, P. (1995). Vocabulary
Size and Use: Lexical Richness in L2 Written Production. Applied
Linguistics, 16(3), 307–322.
Lee, C., Ge, H., & Chung, E. (2021). What
linguistic features distinguish and predict L2 writing quality? A study of examination scripts written by adolescent Chinese
learners of English in Hong
Kong. System, 971.
Leńko-Szymanska, A. (2002). How
to trace the growth in learners’ active vocabulary. A corpus-based
study. In B. Kettermann & G. Marko (Eds.), Teaching
and learning by doing corpus analysis: proceedings of the fourth international conference on teaching and language corpora,
Graz 19–24 July,
2000 (pp. 217–230). Rodopi.
Levitzky-Aviad, T., & Laufer, B. (2013). Lexical
properties in the writing of foreign language learners over 8 years of study: Single words and
collocations. In C. Bardel, C. Lindquist, & B. Laufer (Eds.), L2
vocabulary acquisition, knowledge and use: New perspectives on assessment and corpus
analysis. (pp. 127–148): European Second Language Association.
Lin, Y.-M., & Chen, M. Y. (2020). Understanding
writing quality change: A longitudinal study of repeaters of a high-stakes standardized English proficiency
test. Language
Testing, 37(4), 523–549.
Lines, H. (2014). “It’s
a matter of individual taste, I guess”: Secondary school English teachers’ and students’ conceptualisations of quality in
writing. (PhD dissertation). University of Exeter.
Maamuujav, U. (2021). Examining
lexical features and academic vocabulary use in adolescent L2 students’ text-based analytical
essays. Assessing Writing, 491.
Mazgutova, D., & Kormos, J. (2015). Syntactic
and lexical development in an intensive English for Academic Purposes programme. Journal of
Second Language
Writing, 291, 3–15.
Ministry of Education and
Research. (2013). English subject curriculum
(ENG1–03). Retrieved from [URL] on 23 March 2024.
. (2022). Information for newly arrived parents and guardians: The
education system in Norway. Retrieved from [URL] on 23 March 2024.
Monteiro, K. R., Crossley, S. A., & Kyle, K. (2020). In
Search of New Benchmarks: Using L2 Lexical Frequency and Contextual Diversity Indices to Assess Second Language
Writing. Applied
Linguistics, 41(2), 280–300.
Moreno Espinosa, S. (2005). Can
P_Lex accurately measure lexical richness in the written production of young learners of
EFL? Porta
Linguarum, 41, 7–21.
Norwegian Agency for Quality Assurance in
Education. (n.d.). General information about education in
Norway. Retrieved from [URL] on 23 March 2024.
Nøklestad, A., Hagen, K., Johannessen, J. B., Kosek, M., & Priestley, J. (2017). A
modernised version of the Glossa corpus search system. In J. Tiedemann & N. Tahmasebi (Eds.), Proceedings
of the 21st Nordic Conference on Computational Linguistics
(NoDaLiDa) (pp. 251–254). Association for Computational Linguistics.
Olsson, E. (2021). A
comparative study of CLIL implementation in upper secondary school in Sweden and students’ development of L2 English academic
vocabulary. Language teaching
research, 0(0), 1–26.
R. Core Team. (2021). R: A language and
environment for statistical computing. Vienna, Austria. URL [URL]: R Foundation for Statistical Computing.
Rindal, U., & Piercy, C. (2013). Being
‘neutral’? English pronunciation among Norwegian learners. World
Englishes, 32(2), 211–229.
Schmid, H. (1994). Probabilistic
part-of-speech tagging using decision trees. Proceedings of International Conference on New Methods
in Language Processing. Manchester, UK.
(1995). Improvements
in part-of-speech tagging with an application to German. Proceedings of the ACL
SIGDAT-Workshop. Dublin, Ireland.
Staples, S., Gray, B., Biber, D., & Egbert, J. (2023). Writing
trajectories of grammatical complexity at the university: comparing L1 and L2 English writers in
BAWE. Applied
Linguistics, 44(1), 46–71.
Storch, N., & Tapper, J. (2009). The
impact of an EAP course on postgraduate writing. Journal of English for Academic
Purposes, 81, 207–223.
Verspoor, M., Lowie, W., Chan, H. P., & Vahtrick, L. (2017). Linguistic
complexity in second language development: variability and variation at advanced
stages. Recherches en Didactique des Langues et des
Cultures, 14(1), 1–27.
Vidakovic, I., & Barker, F. (2010). Use
of words and multi-word units in Skills for Life Writing examinations. University of Cambridge
ESOL Examinations Research
Notes, 7–14(41).
Cited by (2)
Cited by two other publications
Dirdal, Hildegunn & Eva Thue Vold
This list is based on CrossRef data as of 17 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
