Article published In: International Journal of Corpus Linguistics
Vol. 17:3 (2012) ► pp.428–441
Corpus CesCa
Compiling a corpus of written Catalan produced by school children
Published online: 25 January 2013
https://doi.org/10.1075/ijcl.17.3.06lla
https://doi.org/10.1075/ijcl.17.3.06lla
This paper outlines the compilation of a corpus of Catalan written production. The CesCa corpus presents a picture of the Catalan written language throughout compulsory schooling. It contains two kinds of data: Vocabularies of five semantic fields comprising 242,404 lexical forms and Textual data of four different discourse genres consisting of 207,028 tokens. Both vocabularies and the textual data have been morphologically analyzed and lemmatized. The corpus is freely available. This paper will outline the main features of the corpus and make some suggestions as to the uses to which the corpus can be put.
Keywords: written Catalan, lexical development, vocabularies, discourse genres
Cited by (11)
Cited by 11 other publications
Lücke, Stephan, Patricia de Crignis, Johanna Wolf & Florian Zacherl
Lücke, Stephan, Patricia de Crignis, Johanna Wolf & Florian Zacherl
Alonso-Cortés-Fradejas, M. Dolores, Mercedes López-Aguado & M. Teresa Llamazares-Prieto
2021. Vocabulary depth and its contribution to text quality in the early years of primary school (Profundidad de vocabulario y su contribución a la calidad textual en los primeros años de Educación Primaria). Journal for the Study of Education and Development: Infancia y Aprendizaje 44:1 ► pp. 82 ff.
Tolchinsky, Liliana
2021. Linguistic patterns of spelling of isolated words to dictation and text-composing in Catalan across elementary school (Patrones lingüísticos de la ortografía en el dictado de palabras aisladas y en la composición de textos en catalán en la escuela primaria). Journal for the Study of Education and Development: Infancia y Aprendizaje 44:1 ► pp. 183 ff.
Salas, Naymé
2022. Concurrent predictors of spelling accuracy in secondary education in a semi-consistent orthography. Written Language & Literacy 25:1 ► pp. 40 ff.
Llaurado, Anna & Julie E. Dockrell
Llaurado, Anna & Julie E. Dockrell
Castillo, Cristina & Liliana Tolchinsky
Llaurado, Anna & Liliana Tolchinsky
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
