Profiling English sentences based on CEFR levels

Uchida, Satoru; Arase, Yuki; Kajiwara, Tomoyuki

doi:10.1075/itl.22018.uch

Article published In: Graded Resources for Second and Foreign Language Learning
Edited by David Alfter and Thomas François
[ITL - International Journal of Applied Linguistics 175:1] 2024
► pp. 103–126

Get fulltext from our e-platform

Download PDF

Download EPUB

Profiling English sentences based on CEFR levels

Satoru Uchida | Kyushu University

Yuki Arase | Osaka University

Tomoyuki Kajiwara | Ehime University

Available under the Creative Commons Attribution (CC BY) 4.0 license.

For any use beyond this license, please contact the publisher at rights@benjamins.nl.

This article was made Open Access under a CC BY 4.0 license through payment of an APC by or on behalf of the authors.

Published online: 22 March 2024

https://doi.org/10.1075/itl.22018.uch

Abstract

The study aims to demonstrate the procedure for constructing the CEFR-based Sentence Profile (CEFR-SP), a dataset with the CEFR levels assigned for sentences, and to identify the characteristics at each level. Basic statistics such as word length and sentence length are presented for each CEFR level for 7,511 carefully selected sentences, and statistical tests are conducted between adjacent levels to identify criterial features. The findings reveal significant differences in word length between adjacent levels, while word difficulty is not significant in discriminating levels at either end (A1–A2, C1–C2). Sentence length and depth are also not significant discriminators for higher levels (B2–C1, C1–C2). Notably, sentence-level data generally exhibit discriminative values compared to text-level statistics, indicating their direct capture of characteristics at each CEFR level.

Keywords: CEFR, sentence-based annotation, language resource, criterial features, English education

Article outline

Introduction
- Aim and research questions
- Related work
Building a sentence-based corpus
- Annotation procedure
- Pilot study for sentence level annotation
- Annotator selection
- Sentence selection
- Annotation results
Methodology
Results
- Basic statistics
- Comparison between sentence-level and text-level datasets
- Keywords
- Typical and atypical sentences
Discussion
Conclusions
Acknowledgements
Notes
References

References (33)

References

Alfter, D., Tiedemann, T. L., & Volodina, E. (2020). Expert judgments versus crowdsourcing in ordering multi-word expressions. Eighth Swedish Language Technology Conference (SLTC). [URL]

Arase, Y., Uchida, S., & Kajiwara, T. (2022). CEFR-based sentence difficulty annotation and assessment. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 6206–6219.

Bax, S. (2012). Text inspector: Online text analysis tool. [URL]

Capel, A. (2015). The English vocabulary profile. In J. Harrison & F. Barker (Eds.), English profile in practice 51 (pp. 9–27). Cambridge University Press.

Chen, Y. H., & Baker, P. (2016). Investigating criterial discourse features across second language development: Lexical bundles in rated learner essays, CEFR B1, B2 and C1. Applied Linguistics, 37(6), 849–880.

Chujo, K., Oghigian, K., & Akasegawa, S. (2015). A corpus and grammatical browsing system for remedial EFL learners. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple affordances of language corpora for data-driven learning (pp. 109–130). John Benjamins.

Collins-Thompson, K. (2014). Computational assessment of text readability: A survey of current and future research. ITL-International Journal of Applied Linguistics, 165(2), 97–135.

Douglas, C. E., & Fligner, A. M. (1991). On distribution-free multiple comparisons in the one-way analysis of variance. Communications in Statistics: Theory and Methods, 201, 127–139.

Dürlich, L., & François, T. (2018). EFLLex: A graded lexical resource for learners of English as a foreign language. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 873–879.

Dwass, M. (1960). Some k-sample rank-order tests. In I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow, & B. H. Mann (Eds.), Contributions to Probability and Statistics. Essays in Honor of H. Hotelling (pp. 198–202). Stanford University Press.

Ehara, Y. (2018). Building an English vocabulary knowledge dataset of Japanese English-as-a-second-language learners using crowdsourcing. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 485–488.

Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233.

François, T. (2015). When readability meets computational linguistics: A new paradigm in readability. Revue française de linguistique appliquée, 20(2), 79–97. [URL].

François, T., & Fairon, C. (2012). An “AI readability” formula for French as a foreign language. Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 466–477.

Harrison, J. (2015). The English grammar profile. In J. Harrison & F. Barker (Eds.), English profile in practice 51 (pp. 28–48). Cambridge University Press.

Hawkins, J. A., & Filipović, L. (2012). Criterial features in L2 English: Specifying the reference levels of the Common European Framework (Vol. 11). Cambridge University Press.

Ishii, Y., & Tono, Y. (2018). Investigating Japanese EFL learners’ overuse/underuse of English grammar categories and their relevance to CEFR levels. Proceedings of the 4th Asia Pacific Corpus Linguistics Conference, 160–165.

Jiang, C., Maddela, M., Lan, W., Zhong, Y., & Xu, W. (2020). Neural CRF model for sentence alignment in text simplification. arXiv preprint arXiv:2005.02324.

Khallaf, N., & Sharoff, S. (2021). Automatic difficulty classification of Arabic sentences. Proceedings of the Sixth Arabic Natural Language Processing Workshop, 105–114.

Kilgarriff, A., Husák, M., McAdam, K., Rundell, M., & Rychlý, P. (2008). GDEX: Automatically finding good dictionary examples in a corpus. Proceedings of the XIII EURALEX international congress, 11, 425–432.

Kim, S. (2021). Generalizability of CEFR criterial grammatical features in a Korean EFL corpus across A1, A2, B1, and B2 levels. Language Assessment Quarterly, 18(3), 273–295.

Klare, G. R. (1968). The role of word frequency in readability. Elementary English, 45(1), 12–22.

Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.

Nation, I. (2006). How large a vocabulary is needed for reading and listening? Canadian Modern Language Review, 63(1), 59–82.

Nishihara, D., Kajiwara, T., & Arase, Y. (2019). Controllable text simplification with lexical constraint loss. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, 260–266.

Ozasa, T., Weir, G., & Fukui, M. (2007). Measuring readability for Japanese learners of English. Proceedings of the 12th Conference of Pan-Pacific Association of Applied Linguistics, 122–125.

Pilán, I., Vajjala, S., & Volodina, E. (2016). A readable read: Automatic assessment of language learning materials based on linguistic complexity. arXiv preprint arXiv:1603.08868.

Pilán, I., Volodina, E., & Johansson, R. (2014). Rule-based and machine learning approaches for second language sentence-level readability. Proceedings of the ninth workshop on innovative use of NLP for building educational applications, 174–184.

Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza: A Python natural language processing toolkit for many human languages. arXiv preprint arXiv:2003.07082.

Salamoura, A., & Saville, N. (2010). Exemplifying the CEFR: Criterial features of written learner English from the English Profile Programme. In I. Bartning, M. Maisa, & I. Vedder (Eds.), Communicative proficiency and linguistic development: Intersections between SLA and language testing research (Vol. 11, pp. 101–132). European Second Language Association.

Steel, R. G. D. (1960). A rank sum test for comparing all pairs of treatments, Technometrics, 21, 197–207.

Uchida, S., & Negishi, M. (2018). Assigning CEFR-J levels to English texts based on textual features. Proceedings of the 4th Asia Pacific Corpus Linguistics Conference, 463–467.

Vajjala, S., & Meurers, D. (2014). Assessing the relative reading level of sentence pairs for text simplification. Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 288–297.

Cited by (6)

Cited by six other publications

Order by:

Rudy, Muhammad, Fazri Nur Yusuf, Emi Emilia & Wawan Gunawan

2026. Difficulty level of EFL test designed by pre‑service teachers. Australian Review of Applied Linguistics

Ahlers, Elias-Leander & Malte Schilling

2025. 2025 3rd International Conference on Foundation and Large Language Models (FLLM), ► pp. 638 ff.

Cooper, Christopher Robert

2025. Predicting the CEFR level of English listening texts with machine learning methods. Research Methods in Applied Linguistics 4:3 ► pp. 100234 ff.

R. Cooper, Christopher

2025. The Construction Complexity Calculator (ConPlex): A tool for calculating Nelson’s (2024) construction-based complexity measure. Research in Corpus Linguistics 13:2 ► pp. 124 ff.

R. Cooper, Christopher

2025. The Construction Complexity Calculator (ConPlex): A tool for calculating Nelson’s (2024) construction-based complexity measure. Research in Corpus Linguistics 13:2 ► pp. 124 ff.

Uchida, Satoru & Masashi Negishi

2025. Assigning CEFR-J levels to English learners’ writing: An approach using lexical metrics and generative AI. Research Methods in Applied Linguistics 4:2 ► pp. 100199 ff.

This list is based on CrossRef data as of 30 march 2026. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.