Corpus analysis

Jan Aarts
Table of contents

Nowadays, when linguists speak of a corpus, they usually mean a collection of computer-readable texts. The design of the collection as well as the nature of the texts may vary considerably from one corpus to another, but the texts, whether spoken or written, must have been produced in an actual context of language use. The utterances constituting the texts are never artificial linguistic objects produced under laboratory conditions for the sole purpose of linguistic research. The fact that corpora are computationally accessible and that they are repositories of language use, largely determines the nature of the linguistic research they are used for. First, corpus analysis nowadays cannot be carried out without the availability of advanced computational tools; secondly, it is naturally oriented towards the study of language use and therefore biased towards the study of specific languages, genres and language varieties.

Full-text access is restricted to subscribers. Log in to obtain additional credentials. For subscription information see Subscription & Price.

References

Aarts, J
1992Comments on ICE. In J. Svartvik (ed.): 180–183.Google Scholar logo with link to Google Scholar
Aarts, J., P. De Haan & N. Oostdijk
(eds.) 1993English language corpora. Rodopi.Google Scholar logo with link to Google Scholar
Black, E., R. Garside & G. Leech
(eds.) 1993Statistically-driven computer grammars of English. Rodopi.Google Scholar logo with link to Google Scholar
Burnage, G. & D. Dunlop
1993Encoding the British National Corpus. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 79–95.Google Scholar logo with link to Google Scholar
Collot, M. & N. Belmore
1993Electronic language. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 41–55.Google Scholar logo with link to Google Scholar
Granger, S
1993International Corpus of Learner English. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 57–69.Google Scholar logo with link to Google Scholar
Greenbaum, S
1992A new corpus of English: ICE. In Svartvik (ed.): 171–179. Google Scholar logo with link to Google Scholar
Harris, Z
1951Methods in structural linguistics. University of Chicago Press.Google Scholar logo with link to Google Scholar
Johansson, S
1980The LOB corpus of British English texts: presentation and comments. ALLC Journal 1: 25–36.Google Scholar logo with link to Google Scholar
Johansson, S. & K. Hofland
1994Towards an English-Norwegian parallel corpus. In U. Fries, G. Tottie & P. Schneider (eds.) Creating and using English language corpora: 25–37. Rodopi.Google Scholar logo with link to Google Scholar
Johansson, S. & A-B. Stenström
(eds.) 1991English computer corpora. Mouton de Gruyter. Google Scholar logo with link to Google Scholar
Karlsson, F
1994Robust parsing of unconstrained text. In N. Oostdijk & P. De Haan (eds.): 121–142.Google Scholar logo with link to Google Scholar
Keulen, F
1986The Dutch computer corpus pilot project. In J. Aarts & W. Meijs (eds.) Corpus linguistics II: 127–155. Rodopi.Google Scholar logo with link to Google Scholar
Knowles, G
1993The Machine-Readable Spoken English Corpus. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 107–119.Google Scholar logo with link to Google Scholar
Kučera, H. & W.N. Francis
1967Computational analysis of present-day American English. Brown University Press.Google Scholar logo with link to Google Scholar
Kytö, M
1991Manual to the diachronic part of the Helsinki corpus of English texts. Helsinki University Dept. of English.Google Scholar logo with link to Google Scholar
., M. Rissanen & S. Wright (eds.) 1994Corpora across the centuries. Rodopi.Google Scholar logo with link to Google Scholar
Leech, G
1991The state of the art in corpus linguistics. In K. Aijmer & B. Altenberg (eds.) English corpus linguistics: 8–29. Longman.Google Scholar logo with link to Google Scholar
Leech, G. & R. Garside
1991Running a grammar factory. In S. Johansson & A-B. Stenström (eds.): 15–32. Google Scholar logo with link to Google Scholar
Leech, G., R. Garside & M. Bryant
1994The large-scale grammatical tagging of text: experience with the British National Corpus. In N. Oostdijk & P. De Haan (eds.): 47–63.Google Scholar logo with link to Google Scholar
Marcus, M., B. Santorini & M. Marcinkiewicz
1993Building a large annotated corpus of English. Computational Linguistics 19: 313–330.Google Scholar logo with link to Google Scholar
Oostdijk, N. & P. De Haan
(eds.) 1994Corpus-based research into language. Rodopi.Google Scholar logo with link to Google Scholar
Quirk, R
1960Towards a description of English usage. Transactions of the Philological Society: 40–61. Google Scholar logo with link to Google Scholar
1992On corpus principles and design. In J. Svartvik (ed.): 457–469. Google Scholar logo with link to Google Scholar
Renouf, A
1993A word in time: first findings from the investigation of dynamic text. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 279–288.Google Scholar logo with link to Google Scholar
Sampson, G
1994SUSANNE: a Domesday Book of English grammar. In N. Oostdijk & P. De Haan (eds.): 169–187.Google Scholar logo with link to Google Scholar
Souter, C
1989A short handbook to the Polytechnic of Wales corpus. Norwegian Computing Centre for the Humanities.Google Scholar logo with link to Google Scholar
Svartvik, J
(ed.) 1990The London-Lund corpus of spoken English. Lund University Press.Google Scholar logo with link to Google Scholar
(ed.) 1992Directions in corpus linguistics. Mouton de Gruyter.  BoPGoogle Scholar logo with link to Google Scholar
Taylor, L., G. Leech & S. Fligelstone
1991A survey of English machine-readable corpora. In S. Johansson & A-B. Stenström (eds.): 319–354.[See also: Statistics] Google Scholar logo with link to Google Scholar
 
Mobile Menu Logo with link to supplementary files background Layer 1 prag Twitter_Logo_Blue