Children Online: A survey of child language and CMC corpora

Baron, Alistair; Rayson, Paul; Greenwood, Phil; Walkerdine, James; Rashid, Awais

doi:10.1075/ijcl.17.4.01bar

Article published In: International Journal of Corpus Linguistics
Vol. 17:4 (2012) ► pp.443–481

Get fulltext from our e-platform

Download PDF

Children Online

A survey of child language and CMC corpora

Alistair Baron | Lancaster University

Paul Rayson | Isis Forensics

Phil Greenwood

James Walkerdine

Awais Rashid

Published online: 29 March 2013

https://doi.org/10.1075/ijcl.17.4.01bar

The collection of representative corpus samples of both child language and online (CMC) language varieties is crucial for linguistic research that is motivated by applications to the protection of children online. In this paper, we present an extensive survey of corpora available for these two areas. Although a significant amount of research has been undertaken both on child language and on CMC language varieties, a much smaller number of datasets are made available as corpora. Especially lacking are corpora which match requirements for verifiable age and gender metadata, although some include self-reported information, which may be unreliable. Our survey highlights the lack of corpus data available for the intersecting area of child language in CMC environments. This lack of available corpus data is a significant drawback for those wishing to undertake replicable studies of child language and online language varieties.

Keywords: survey, child language, CMC

Cited by (1)

Cited by one other publication

Yuan, Yue, Jiaxin Gu, Xin Guo, Yushu Zhu & Qiang Fu

2022. Detecting temporal anomalies with pseudo age groups: Homeownership in Canada, 1981 to 2016. Population, Space and Place 28:1

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.