The English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus

Crossley, Scott; Tian, Yu; Baffour, Perpetual; Franklin, Alex; Kim, Youngmeen; Morris, Wesley; Benner, Meg; Picou, Aigner; Boser, Ulrich

doi:10.1075/ijlcr.22026.cro

Article published In: International Journal of Learner Corpus Research
Vol. 9:2 (2023) ► pp.248–269

Get fulltext from our e-platform

Download PDF

Download EPUB

Corpus report

The English Language Learner Insight, Proficiency and Skills Evaluation (ELLIPSE) Corpus

Scott Crossley | Vanderbilt University

Yu Tian | Georgia State University

Perpetual Baffour | The Learning Agency

Alex Franklin | The Learning Agency

Youngmeen Kim | Georgia State University

Wesley Morris | Vanderbilt University

Meg Benner | The Learning Agency

Aigner Picou | The Learning Agency

Ulrich Boser | The Learning Agency

Published online: 8 February 2024

https://doi.org/10.1075/ijlcr.22026.cro

Abstract

This paper introduces the open-source English Language Learning Insight, Proficiency and Skills Evaluation (ELLIPSE) corpus. The corpus comprises ~6,500 essays written by English language learners (ELLs). All essays were written during state-wide standardized annual testing in the United States. The essays were written on 29 different independent prompts that required no background knowledge on the part of the writer. Individual difference information is made available for each essay including economic status, gender, grade level (8–12), and race/ethnicity. Each essay was scored by two trained human raters for English language proficiency including an overall score of English proficiency and analytic scores for cohesion, syntax, vocabulary, phraseology, grammar, and conventions. The paper provides reliability on the human judgments of proficiency reported for the corpus. The ELLIPSE corpus addresses many of the concerns found in existing learner corpora including unique holistic and analytic scores for each ELL essay. The corpus also includes limited demographic and individual difference data for each ELL.

Keywords: language proficiency, language testing, open source learner corpus

Article outline

1.Introduction
- 1.2Measuring proficiency
2.The ELLIPSE Corpus
- 2.1Initial corpus
- 2.2Proficiency scoring
- 2.2Final ELLIPSE corpus
  - 2.2.1Text statistics
  - 2.2.2Meta-data
  - 2.2.3Score distribution
3.Conclusion
Open Material badge
Notes
References

References (49)

References

Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests (Vol. 11). Oxford University Press.

Bailey, A. L., & Kelly, K. R. (2010). The use and validity of home language surveys in state English language proficiency assessment systems: A review and issues perspective (Evaluating the Validity of English Language Proficiency Assessment). edCount, LLC Center of Assessment UCLA. [URL]

Birdsong, D. (2005). Interpreting age effects in second language acquisition. In J. F. Kroll & A. M. B. de Groot (Eds.), Handbook of bilingualism: Psycholinguistic approaches (pp. 109–127). Oxford University Press.

Blanchard, D., Tetreault, J., Higgins, D., Cahill, A., & Chodorow, M. (2013). TOEFL11: A corpus of non-native English. ETS Research Report Series 2013(2).

Boyd, A., Hana, J., Nicolas, L., Meurers, D., Wisniewski, K., Abel, A., Schöne, K., Štindlová, B., & Vettori, C. (2014). The MERLIN corpus: Learner language and the CEFR. In N. Calzolari, K. Choukri, T. Declerck, H. Loftsson, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14) (pp: 1281–1288). European Language Resources Association (ELRA).

Chapelle, C. A., Enright, M. K., & Jamieson, J. (Eds.) (2008). Building a validity argument for the Test of English as a Foreign Language. Routledge.

Cheng, W., & Warren, M. (2005). Peer assessment of language proficiency. Language Testing, 22(1), 93–121.

Choi, I. (2016). Efficacy of an ICALL tutoring system and process-oriented corrective feedback. Computer Assisted Language Learning, 29(2), 334–364.

Chomsky, C. (1972). Stages in Language Development and Reading Exposure. Harvard Educational Review, 42(1), 1–33.

Clifford, R., & Cox, T. L. (2013). Empirical validation of reading proficiency guidelines. Foreign Language Annals, 46(1), 45–61.

Cohen, J. (1992). Statistical Power Analysis. Current Directions in Psychological Science, 1(3), 98–101.

Council of Europe (2001). Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge University Press.

Crossley, S. A., Kyle, K., & McNamara, D. S. (2016). The tool for the automatic analysis of text cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods, 48(4), 1227–1237.

(2017). Sentiment Analysis and Social Cognition Engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis. Behavior Research Methods, 49(3), 803–821.

Crossley, S. A., & McNamara, D. S. (2010). Cohesion, coherence, and expert evaluations of writing proficiency. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the Annual Meeting of the Cognitive Science Society (pp. 984–989). Cognitive Science Society.

Crossley, S., Salsbury, T., McNamara, D. S. (2013). Validating lexical measures using human scores of lexical proficiency. In S. Jarvis & M. Daller (Eds.), Vocabulary knowledge: Human ratings and automated measures (pp. 105–134). John Benjamins.

Ellis, R. (1991). Grammatically judgments and second language acquisition. Studies in Second Language Acquisition, 13(2), 161–186.

(2003). Task-based language learning and teaching. Oxford University Press.

Figueras, N. (2012). The impact of the CEFR. ELT Journal, 66(4), 477–485.

Foddy, W. (1993). Constructing questions for interviews and questionnaires: Theory and practice in social research. Cambridge University Press.

Geertzen, J., Alexopoulou, T., & Korhonen, A. (2013). Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT). In R. T. Miller, K. I. Martin, C. M. Eddington, A. Henery, N. Marcos Miguel, A. M. Tseng, A. Tuninetti, & D. Walter (Eds.), Proceedings of the 31st Second Language Research Forum: Building Bridges Between Disciplines (pp. 240–254). Cascadilla Proceedings Project.

Granena, G. (2019). Cognitive aptitudes and L2 speaking proficiency: Links between LLAMA and Hi-LAB. Studies in Second Language Acquisition, 41(2), 313–336.

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202.

Housen, A., & Kuiken, F. (2009). Complexity, accuracy, and fluency in second language acquisition. Applied linguistics, 30(4), 461–473.

Housen, A., Kuiken, F., & Vedder, I. (Eds.). (2012). Dimensions of L2 performance and proficiency: Complexity, accuracy and fluency in SLA (Vol. 321). John Benjamins.

Hymes, D. (1972). Editorial Introduction to Language in Society. Language in Society, 1(1), 1–14.

Ishikawa, S. I. (2013). The ICNALE and sophisticated contrastive interlanguage analysis of Asian learners of English. Learner Corpus Studies in Asia and the World, 1(1), 91–118.

Kim, A. Y. (2015). Exploring ways to provide diagnostic feedback with an ESL placement test: Cognitive diagnostic assessment of L2 reading ability. Language Testing, 32(2), 227–258.

Kyle, K., & Crossley, S. A. (2018). Measuring Syntactic Complexity in L2 Writing Using Fine-Grained Clausal and Phrasal Indices. The Modern Language Journal, 102(2), 333–349.

Kyle, K., Crossley, S., & Berger, C. (2018). The tool for the automatic analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods, 50(3), 1030–1046.

Kyle, K., Crossley, S. A., & Jarvis, S. (2021). Assessing the Validity of Lexical Diversity Indices Using Direct Judgements. Language Assessment Quarterly, 18(2), 154–170.

Lagakis, P., & Demetriadis, S. (2021). Automated essay scoring: A review of the field. 2021 International Conference on Computer, Information and Telecommunication Systems (CITS), 1–6.

Larsen-Freeman, D. (1978). An ESL Index of Development. TESOL Quarterly, 12(4), 439–448.

Laufer, B., & Nation, P. (1999). A vocabulary-size test of controlled productive ability. Language Testing, 16(1), 33–51.

Lim, G. S. (2011). The development and maintenance of rating quality in performance writing assessment: A longitudinal study of new and experienced raters. Language Testing, 28(4), 543–560.

Linacre, J. M. (2021). A User’s Guide to FACETS Rasch-Model Computer Programs. Program Manual 3.83.5.

Lisken-Gasparro, J. E. (1984). The ACTFL proficiency guidelines: Gateway to testing and curriculum. Foreign Language Annals 17(5), 475–489.

Lumley, T. (1998). Perceptions of language-trained raters and occupational experts in a test of occupational English language profficiency. English for Specific Purposes, 17(4), 347–367.

(2005). Assessing second language writing: The rater’s perspective. Peter Lang.

Meurers, D., De Kuthy, K., Nuxoll, F., Rudzewitz, B., & Ziai, R. (2019). Scaling up intervention studies to investigate real-life foreign language learning in school. Annual Review of Applied Linguistics, 391, 161–188.

McNamara, T., Knoch, U., Fan, J., & Rossner, R. (2019). Fairness, justice & language assessment. Oxford University Press.

Ortega, L. (2012). Epilogue: Exploring L2 writing–SLA interfaces. Journal of Second Language Writing, 21(4), 404–415.

O’Sullivan, B. (2018). IELTS (international English language testing system). In J. I. Liontas (Ed. in Chief), The TESOL Encyclopedia of English Language Teaching (pp. 1–8). Wiley.

Plonsky, L. (2023). Sampling and Generalizability in Lx Research: A Second-Order Synthesis. Languages 8(1), 751, 1–13.

Skehan, P. (1989). Individual differences in second-language learning. Edward Arnold.

U.S. Department of Education. (2017). Our nation’s English learners. US Department of Education. [URL]

Weigle, S. C. (2004). Integrating reading and writing in a competency test for non-native speakers of English. Assessing Writing, 9(1), 27–55.

Widdowson, H. G. (1983). Learning purpose and language use. Oxford University Press.

Wood, C., & Schatschneider, C. (2021). Examining Writing Measures and Achievement for Students of Varied Language Abilities and Linguistic. Reading and Writing Quarterly, 37(1), 65–81.

Cited by (10)

Cited by ten other publications

Order by:

Leşeanu, Anda, İbrahim Rıza Hallaç, Burçin Buket Oğul & Hasan Oğul

2026. Revisiting Automatic Essay Assessment: A Relative Approach. In Hybrid Artificial Intelligent Systems [Lecture Notes in Computer Science, 16203], ► pp. 41 ff.

Chu, Seong Yeub, Jong Woo Kim & Mun Yong Yi

2025. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, ► pp. 1 ff.

Crossley, Scott A., Perpetual Baffour, L. Burleigh & Jules King

2025. A large-scale corpus for assessing source-based writing quality: ASAP 2.0. Assessing Writing 65 ► pp. 100954 ff.

Gürel, Sungur, Murat Şahin, İbrahim Uysal, Ali İbileme & Tuba Gündüz

2025. Adaptive Selection Algorithm and Standard Error Termination Rule in Comparative Judgement: An Application for Assessing Writing Skills. Education and Science 50 ► pp. 93 ff.

Oğuz, Enis

2025. Can generative AI figure out figurative language? The influence of idioms on essay scoring by ChatGPT, Gemini, and Deepseek. Assessing Writing 66 ► pp. 100981 ff.

Thwaites, Peter, Nathan Vandeweerd & Magali Paquot

2025. Crowdsourced Comparative Judgement for Evaluating Learner Texts: How Reliable are Judges Recruited from an Online Crowdsourcing Platform?. Applied Linguistics 46:4 ► pp. 611 ff.

Yamashita, Taichi

2025. Exploring potential biases in GPT-4o’s ratings of English language learners’ essays. Language Testing 42:3 ► pp. 344 ff.

Zambrano, Andres Felipe, Shreya Singhal, Maciej Pankiewicz, Ryan Shaun Baker, Chelsea Porter & Xiner Liu

2025. De-identifying student personally identifying information in discussion forum posts with large language models. Information and Learning Sciences 126:5/6 ► pp. 401 ff.

Mahmoud, Somaia, Emad Nabil & Marwan Torki

2024. Automatic Scoring of Arabic Essays: A Parameter-Efficient Approach for Grammatical Assessment. IEEE Access 12 ► pp. 142555 ff.

Paquot, Magali

2024. Learner corpus research: a critical appraisal and roadmap for contributing (more) to SLA research agendas. Corpus Linguistics and Linguistic Theory 20:3 ► pp. 567 ff.

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.