In:Beyond Concordance Lines: Corpora in language education
Edited by Pascual Pérez-Paredes and Geraldine Mark
[Studies in Corpus Linguistics 102] 2021
► pp. 207–230
Chapter 9Scoledit
A tool to analyse learner writing and better understand the challenges of language education
Published online: 22 December 2021
https://doi.org/10.1075/scl.102.09wol
https://doi.org/10.1075/scl.102.09wol
Abstract
The purpose of Scoledit is to build a computer-aided longitudinal corpus of texts written by pupils between 6 and 11 years as well as associated automatic processing tools. This project seeks to produce linguistic descriptions of pupils’ writings and to facilitate the teaching of spelling and writing. Currently, an increasing number of projects aim to create large primary school corpora of French (Elalouf, 2005; Garcia-Debanc & Bonnemaison, 2014; David & Doquet, 2016). However, these corpora are neither longitudinal nor associated with natural language processing (NLP) tools (Wolfarth, 2017). This chapter discusses some of the automated tools for linguistic analyses developed and the advantages of the Scoledit project in the context of language teaching
Article outline
- Context
- The Scoledit project
- Corpus design
- Specific tools for processing
- Description of the longitudinal corpus
- Grammatical categories
- Breakdown of error categories
- Breakdown of errors by grammatical category
- Observation of verbal morphology
- Breakdown of verb tenses
- Error breakdown
- Distinction of errors in the stem and the inflection
- Teaching recommendations on verbal tenses
- Hyposegmentation and hypersegmentation
- Elision, a frequent factor in hyposegmentation
- Hyposegmentation: The case of reflexive verbs
- A particular hyposegmentation issue: The alternation of la/‘l’a’
- Teaching recommendations on word segmentation
- Conclusion
Notes References
References (24)
Banerji, N., Gupta, V., Kilgarriff, A., & Tugwell, D. (2013). Oxford children’s corpus : A corpus of children’s writing, reading, and education. Corpus Linguistics 2013, 315–317.
Berkling, K. (2016). Corpus for children’s writing with enhanced output for specific spelling patterns (2nd and 3rd Grade). In N. Calzolari, K. Choukri, T. Declerck, S. Goggi … S. Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) (pp. 3200–3206). European Language Resources Association (ELRA).
(2018). A 2nd longitudinal corpus for children’s writing with enhanced output for specific spelling patterns. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck … T. Tokunaga (Eds.), Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC-2018) (pp. 2262–2268). European Language Resources Association (ELRA).
Boré, C., & Elalouf, M.-L. (2017). Deux étapes dans la construction de corpus scolaires: Problèmes récurrents et perspectives nouvelles. Corpus, 16, 31–64.
Brissaud, C., & Chevrot, J.-P. (2011). The late acquisition of a major difficulty of French inflectional orthography: The homophonic /E/ verbal endings. Writing Systems Research, 3(2), 129–144.
Catach, N. (1980). L’orthographe française: Traité théorique et pratique avec des travaux d’application et leurs corrigés (Vol. 3). Nathan.
Chipere, N., Malvern, D., & Richards, B. (2004). Using a corpus of children’s writing to test a solution to the sample size problem affecting type-token ratios. In G. Aston, S. Bernardini, & D. Stewart (Eds.), Corpora and language learners (pp. 139–147). John Benjamins.
Clanché, P. (1988). L’enfant écrivain: Génétique et symbolique du texte libre. Paidos Le Centurion. Persée. Retrieved from [URL]
De Vogüé, S., Espinoza, N., Garcia, B., Perini, M., & Marzena Watorek, F. (2017). Constitution d’un grand corpus d’écrits émergents et novices: Principes et méthodes. Corpus, 16, 65–86.
Doquet, C., Enoiu, V., Fleury, S., & Maziotti, S. (2017). Problèmes posés par la transcription et l’annotation d’écrits d’élèves. Corpus, 16, 133–156.
Elalouf, M.-L. (2005). Écrire entre 10 et 14 ans un corpus, des analyses, des repères pour la formation. Canopé – CRDP de Versailles.
Garcia-Debanc, C., & Bonnemaison, K. (2014). La gestion de la cohésion textuelle par des élèves de 11–12 ans: Réussites et difficultés. Actes du 4e Congrès Mondial de Linguistique Française (CMLF 2014), Juillet 2014, 8, 961–976.
Gendner, V., & Adda-Decker, M. (2002). Analyse comparative de corpus oraux et écrits français: Mots, lemmes et classes morpho-syntaxiques. Actes des XIVes Journées d’Etude sur la Parole, Nancy.
Juel, C. (1988). Learning to read and write: A longitudinal study of 54 children from first through fourth grades. Journal of Educational Psychology, 80(4), 437–447.
Lavalley, R., Berkling, K., & Stüker, S. (2015). Preparing children’s writing database for automated processing. LTLT@ SLaTE, 9–15.
Lété, B., Sprenger-Charolles, L., & Colé, P. (2004). MANULEX : A grade-level lexical database from French elementary school readers. Behavior Research Methods, Instruments, & Computers, 36, 156–166.
Penloup, M.-C. (2001). De quelques propriétés d’une pratique de lecture extrascolaire: Le courrier des lecteurs du journal Astrapi. Repères. Recherches en Didactique du Français Langue Maternelle, 23(1), 75–91.
Savelli, M., Brissaud, C., Chevrot, J.-P., & Gounon, V. (2002). L’apprentissage d’un temps peu enseigné: Le passé simple. LeFfrancais Aujourd’hui, 4, 39–48.
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of the International Conference on New Methods in Language Processing. Manchester, UK (pp. 44–49).
Smith, N., McEnery, T., & Ivanic, R. (1998). Issues in transcribing a corpus of children’s handwritten projects. Literary and Linguistic Computing, 13(4), 217–225.
Wolfarth, C., Brissaud, C., & Ponton, C. (2018). Transcrire et normer un corpus scolairep: Pour quelles analyses ? In C. Brissaud, M. Dreyfus, & B. Kervyn (Eds.), Repenser l’écriture et son évaluation au primaire et au secondaire (p. 121–146). Presses universitaires de Namur.
