Article published In: International Journal of Corpus Linguistics
Vol. 10:4 (2005) ► pp.517–541
Creating and using Web corpora
Published online: 7 November 2005
https://doi.org/10.1075/ijcl.10.4.07the
https://doi.org/10.1075/ijcl.10.4.07the
The Web has recently been used as a corpus for linguistic investigations, often with the help of a commercial search engine. We discuss some potential problems with collecting data from commercial search engine and with using the Web as a corpus. We outline an alternative strategy for data collection, using a personal Web crawler. As a case study, the university Web sites of three nations (Australia, New Zealand and the UK) were crawled. The most frequent words were broadly consistent with non-Web written English, but with some academic-related words amongst the top 50 most frequent. It was also evident that the university Web sites contained a significant amount of non-English text, and academic Web English seems to be more future-oriented than British National Corpus written English.
Keywords: academic language, web corpus, web
Cited by (5)
Cited by five other publications
CANAN HÄNSEL, EVA & DAGMAR DEUBER
Perelmutter, Renee
Koteyko, Nelya
Baroni, Marco, Silvia Bernardini, Adriano Ferraresi & Eros Zanchetta
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
