Article published In: Graded Resources for Second and Foreign Language Learning
Edited by David Alfter and Thomas François
[ITL - International Journal of Applied Linguistics 175:1] 2024
► pp. 103–126
Profiling English sentences based on CEFR levels
Available under the Creative Commons Attribution (CC BY) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
This article was made Open Access under a CC BY 4.0 license through payment of an APC by or on behalf of the authors.
Published online: 22 March 2024
https://doi.org/10.1075/itl.22018.uch
https://doi.org/10.1075/itl.22018.uch
Abstract
The study aims to demonstrate the procedure for constructing the CEFR-based Sentence Profile (CEFR-SP), a dataset
with the CEFR levels assigned for sentences, and to identify the characteristics at each level. Basic statistics such as word
length and sentence length are presented for each CEFR level for 7,511 carefully selected sentences, and statistical tests are
conducted between adjacent levels to identify criterial features. The findings reveal significant differences in word length
between adjacent levels, while word difficulty is not significant in discriminating levels at either end (A1–A2, C1–C2). Sentence
length and depth are also not significant discriminators for higher levels (B2–C1, C1–C2). Notably, sentence-level data generally
exhibit discriminative values compared to text-level statistics, indicating their direct capture of characteristics at each CEFR
level.
Article outline
- Introduction
- Aim and research questions
- Related work
- Building a sentence-based corpus
- Annotation procedure
- Pilot study for sentence level annotation
- Annotator selection
- Sentence selection
- Annotation results
- Methodology
- Results
- Basic statistics
- Comparison between sentence-level and text-level datasets
- Keywords
- Typical and atypical sentences
- Discussion
- Conclusions
- Acknowledgements
- Notes
References
References (33)
Alfter, D., Tiedemann, T. L., & Volodina, E. (2020). Expert
judgments versus crowdsourcing in ordering multi-word expressions. Eighth Swedish Language
Technology Conference (SLTC). [URL]
Arase, Y., Uchida, S., & Kajiwara, T. (2022). CEFR-based
sentence difficulty annotation and assessment. Proceedings of the 2022 Conference on Empirical
Methods in Natural Language Processing, 6206–6219.
Bax, S. (2012). Text
inspector: Online text analysis tool. [URL]
Capel, A. (2015). The
English vocabulary profile. In J. Harrison & F. Barker (Eds.), English
profile in
practice 51 (pp. 9–27). Cambridge University Press.
Chen, Y. H., & Baker, P. (2016). Investigating
criterial discourse features across second language development: Lexical bundles in rated learner essays, CEFR B1, B2 and
C1. Applied
Linguistics, 37(6), 849–880.
Chujo, K., Oghigian, K., & Akasegawa, S. (2015). A
corpus and grammatical browsing system for remedial EFL
learners. In A. Leńko-Szymańska & A. Boulton (Eds.), Multiple
affordances of language corpora for data-driven
learning (pp. 109–130). John Benjamins.
Collins-Thompson, K. (2014). Computational
assessment of text readability: A survey of current and future research. ITL-International
Journal of Applied
Linguistics, 165(2), 97–135.
Douglas, C. E., & Fligner, A. M. (1991). On
distribution-free multiple comparisons in the one-way analysis of variance. Communications in
Statistics: Theory and
Methods, 201, 127–139.
Dürlich, L., & François, T. (2018). EFLLex:
A graded lexical resource for learners of English as a foreign language. Proceedings of the
Eleventh International Conference on Language Resources and Evaluation (LREC
2018), 873–879.
Dwass, M. (1960). Some
k-sample rank-order tests. In I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow, & B. H. Mann (Eds.), Contributions
to Probability and Statistics. Essays in Honor of H.
Hotelling (pp. 198–202). Stanford University Press.
Ehara, Y. (2018). Building
an English vocabulary knowledge dataset of Japanese English-as-a-second-language learners using
crowdsourcing. Proceedings of the Eleventh International Conference on Language Resources and
Evaluation (LREC 2018), 485–488.
François, T. (2015). When
readability meets computational linguistics: A new paradigm in readability. Revue française de
linguistique
appliquée, 20(2), 79–97. [URL].
François, T., & Fairon, C. (2012). An
“AI readability” formula for French as a foreign language. Proceedings of the 2012 Joint
Conference on Empirical Methods in Natural Language Processing and Computational Natural Language
Learning, 466–477.
Harrison, J. (2015). The
English grammar profile. In J. Harrison & F. Barker (Eds.), English
profile in
practice 51 (pp. 28–48). Cambridge University Press.
Hawkins, J. A., & Filipović, L. (2012). Criterial
features in L2 English: Specifying the reference levels of the Common European
Framework (Vol. 11). Cambridge University Press.
Ishii, Y., & Tono, Y. (2018). Investigating
Japanese EFL learners’ overuse/underuse of English grammar categories and their relevance to CEFR
levels. Proceedings of the 4th Asia Pacific Corpus Linguistics
Conference, 160–165.
Jiang, C., Maddela, M., Lan, W., Zhong, Y., & Xu, W. (2020). Neural
CRF model for sentence alignment in text simplification. arXiv
preprint arXiv:2005.02324.
Khallaf, N., & Sharoff, S. (2021). Automatic
difficulty classification of Arabic sentences. Proceedings of the Sixth Arabic Natural Language
Processing Workshop, 105–114.
Kilgarriff, A., Husák, M., McAdam, K., Rundell, M., & Rychlý, P. (2008). GDEX:
Automatically finding good dictionary examples in a corpus. Proceedings of the XIII EURALEX
international
congress, 11, 425–432.
Kim, S. (2021). Generalizability
of CEFR criterial grammatical features in a Korean EFL corpus across A1, A2, B1, and B2
levels. Language Assessment
Quarterly, 18(3), 273–295.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building
a large annotated corpus of English: The Penn Treebank. Computational
Linguistics, 19(2), 313–330.
Nation, I. (2006). How
large a vocabulary is needed for reading and listening? Canadian Modern Language
Review, 63(1), 59–82.
Nishihara, D., Kajiwara, T., & Arase, Y. (2019). Controllable
text simplification with lexical constraint loss. Proceedings of the 57th Annual Meeting of the
Association for Computational Linguistics: Student Research
Workshop, 260–266.
Ozasa, T., Weir, G., & Fukui, M. (2007). Measuring
readability for Japanese learners of English. Proceedings of the 12th Conference of Pan-Pacific
Association of Applied Linguistics, 122–125.
Pilán, I., Vajjala, S., & Volodina, E. (2016). A
readable read: Automatic assessment of language learning materials based on linguistic
complexity. arXiv
preprint arXiv:1603.08868.
Pilán, I., Volodina, E., & Johansson, R. (2014). Rule-based
and machine learning approaches for second language sentence-level readability. Proceedings of
the ninth workshop on innovative use of NLP for building educational
applications, 174–184.
Qi, P., Zhang, Y., Zhang, Y., Bolton, J., & Manning, C. D. (2020). Stanza:
A Python natural language processing toolkit for many human languages. arXiv
preprint arXiv:2003.07082.
Salamoura, A., & Saville, N. (2010). Exemplifying
the CEFR: Criterial features of written learner English from the English Profile
Programme. In I. Bartning, M. Maisa, & I. Vedder (Eds.), Communicative
proficiency and linguistic development: Intersections between SLA and language testing
research (Vol. 11, pp. 101–132). European Second Language Association.
Steel, R. G. D. (1960). A
rank sum test for comparing all pairs of
treatments, Technometrics, 21, 197–207.
Cited by (6)
Cited by six other publications
Rudy, Muhammad, Fazri Nur Yusuf, Emi Emilia & Wawan Gunawan
2026. Difficulty level of EFL test designed by pre‑service teachers. Australian Review of Applied Linguistics
Ahlers, Elias-Leander & Malte Schilling
Cooper, Christopher Robert
R. Cooper, Christopher
R. Cooper, Christopher
This list is based on CrossRef data as of 30 march 2026. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
