Article published In: International Journal of Corpus Linguistics
Vol. 23:4 (2018) ► pp.494–508
BasiScript
A corpus of contemporary Dutch texts written by primary school children
Published online: 27 December 2018
https://doi.org/10.1075/ijcl.17086.tel
https://doi.org/10.1075/ijcl.17086.tel
Abstract
This short paper introduces BasiScript, a 9-million-word corpus of contemporary Dutch texts written by primary
school children. The data were collected over three years with 17,216 children contributing texts throughout this period. Each
word token in the corpus is annotated with the correct orthographical form, the associated lemma and the part of speech. The most
frequent polysemous words have been annotated for word meaning, while all words in the lexicon that was derived from the
BasiScript corpus have been annotated for corpus and subcorpora frequency, dispersion, length, family size, family frequency,
orthographic neighborhood size, and orthographic neighborhood frequency. Images of the texts are available to researchers. The
present article describes the corpus and presents a comparison of BasiScript with BasiLex (a Dutch corpus with texts primary
school children are likely to read, completed in 2015) by means of frequency profiling.
Article outline
- 1.Introduction
- 2.BasiScript
- 2.1Data collection
- 2.2Format
- 2.3Annotation
- 3.BasiScript versus BasiLex: Frequency profiling
- 3.1Method
- 3.2Differences in the frequencies of function words
- 4.Conclusion
- Acknowledgements
References
References (17)
Balota, D., Yap, M., & Cortese, M. J. (2006). Visual word recognition. In M. J. Traxler & M. A. Gernsbacher (Eds.), Handbook of Psycholinguistics (pp. 285–376). Amsterdam: Elsevier Academic Press.
Bracken, S., & Fischel, J. E. (2008). Family reading behaviour and literacy skills in preschool children from low-income backgrounds. Early Education and Development, 19(1), 45–67.
Chiu, S. I., Hong, F. Y., & Hu, H. Y. (2015). The effects of family cultural capital and reading motivation on reading behaviour in elementary school students. School Psychology International, 36(1), 3–17.
Clark, C., & Teravainen, A. (2017). Book Ownership and Reading Outcomes. London: National Literacy Trust.
Drijbooms, E., Groen, M., & Verhoeven, L. (2017). How executive functions predict development in syntactic complexity of narrative writing in the upper elementary grades. Reading & Writing, 30(1), 209–231.
Evers-Vermeul, J., & Sanders, T. (2009). The emergence of Dutch connectives; how cumulative cognitive complexity explains the order of acquisition. Journal of Child Language, 36(4), 829–854.
Fayol, M., & Mouchon, S. (1997). Production and comprehension of connectives in the written modality: A study of written French. In C. Pontecorvo (Ed.), Writing Development: An Interdisciplinary View (pp. 193–204). Amsterdam/Philadelphia, PA: John Benjamins.
Johannes, K., Wilson, C., & Landau, B. (2016). The importance of lexical verbs in the acquisition of spatial prepositions: The case of in and on
. Cognition, 1571, 174–189.
Kent, S., & Wanzek, J. (2016). The relationship between component skills and writing quality and production across developmental levels: A meta-analysis of the last 25 years. Review of Educational Research, 86(2), 570–601.
Meints, K., Plunkett, K., Harris, P. L., & Dimmock, D. (2002). What is ‘on’ and ‘under’ for 15-, 18-, and 24-month-olds? Typicality effects in early comprehension of spatial prepositions. British Journal of Developmental Psychology, 20(1), 113–130.
Penning de Vries, B., & Tellings, A. (forthcoming). Development of connective frequency in Dutch child-directed texts: a corpus analysis.
Perfetti, C. A., & Hart, L. (2001). The lexical quality hypothesis. In L. Verhoeven, C. Elbro, & P. Reitsma (Eds.), Precursors of Functional Literacy (pp. 189–214). Amsterdam/Philadelphia, PA: John Benjamins.
Peterson, C., & McCabe, A. (1987). The connective “and”: Do older children use it less as they learn other connectives? Journal of Child Language, 14(2), 375–381.
Rayson, P., & Garside, R. (2000). Comparing Corpora using Frequency Profiling. In Proceedings of the workshop on Comparing Corpora, 38th annual meeting of the Association for Computational Linguistics (ACL 2000), 1–6. Hong Kong.
Tellings, A., Hulsbosch, M., Vermeer, A., & van den Bosch, A. (2014). BasiLex: An 11.5 million word corpus of Dutch texts written for children. Computational Linguistics in the Netherlands, 41, 191–208.
Van den Bosch, A., Busser, G. J., Daelemans, W., & Canisius, S. (2007). An efficient memory-based morphosyntactic tagger and parser for Dutch. In F. van Eynde, P. Dirix, I. Schuurman, & V. Vandeghinste (Eds.), Selected Papers of the 17th Computational Linguistics in the Netherlands Meeting (CLIN-17, Leuven), (pp. 99–114). Utrecht: LOT. Retrieved from [URL] (last accessed September 2018).
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
