Corpus analysis

Jan Aarts

Table of contents

References
Related articles

Nowadays, when linguists speak of a corpus, they usually mean a collection of computer-readable texts. The design of the collection as well as the nature of the texts may vary considerably from one corpus to another, but the texts, whether spoken or written, must have been produced in an actual context of language use. The utterances constituting the texts are never artificial linguistic objects produced under laboratory conditions for the sole purpose of linguistic research. The fact that corpora are computationally accessible and that they are repositories of language use, largely determines the nature of the linguistic research they are used for. First, corpus analysis nowadays cannot be carried out without the availability of advanced computational tools; secondly, it is naturally oriented towards the study of language use and therefore biased towards the study of specific languages, genres and language varieties.

References

Aarts, J

1992 Comments on ICE. In J. Svartvik (ed.): 180–183.

Aarts, J., P. De Haan & N. Oostdijk

(eds.) 1993 English language corpora. Rodopi.

Black, E., R. Garside & G. Leech

(eds.) 1993 Statistically-driven computer grammars of English. Rodopi.

Burnage, G. & D. Dunlop

1993 Encoding the British National Corpus. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 79–95.

Collot, M. & N. Belmore

1993 Electronic language. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 41–55.

Granger, S

1993 International Corpus of Learner English. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 57–69.

Greenbaum, S

1992 A new corpus of English: ICE. In Svartvik (ed.): 171–179.

Harris, Z

1951 Methods in structural linguistics. University of Chicago Press.

Johansson, S

1980 The LOB corpus of British English texts: presentation and comments. ALLC Journal 1: 25–36.

Johansson, S. & K. Hofland

1994 Towards an English-Norwegian parallel corpus. In U. Fries, G. Tottie & P. Schneider (eds.) Creating and using English language corpora: 25–37. Rodopi.

Johansson, S. & A-B. Stenström

(eds.) 1991 English computer corpora. Mouton de Gruyter.

Karlsson, F

1994 Robust parsing of unconstrained text. In N. Oostdijk & P. De Haan (eds.): 121–142.

Keulen, F

1986 The Dutch computer corpus pilot project. In J. Aarts & W. Meijs (eds.) Corpus linguistics II: 127–155. Rodopi.

Knowles, G

1993 The Machine-Readable Spoken English Corpus. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 107–119.

Kučera, H. & W.N. Francis

1967 Computational analysis of present-day American English. Brown University Press.

Kytö, M

1991 Manual to the diachronic part of the Helsinki corpus of English texts. Helsinki University Dept. of English.

., M. Rissanen & S. Wright (eds.) 1994 Corpora across the centuries. Rodopi.

Leech, G

1991 The state of the art in corpus linguistics. In K. Aijmer & B. Altenberg (eds.) English corpus linguistics: 8–29. Longman.

Leech, G. & R. Garside

1991 Running a grammar factory. In S. Johansson & A-B. Stenström (eds.): 15–32.

Leech, G., R. Garside & M. Bryant

1994 The large-scale grammatical tagging of text: experience with the British National Corpus. In N. Oostdijk & P. De Haan (eds.): 47–63.

Marcus, M., B. Santorini & M. Marcinkiewicz

1993 Building a large annotated corpus of English. Computational Linguistics 19: 313–330.

Oostdijk, N. & P. De Haan

(eds.) 1994 Corpus-based research into language. Rodopi.

Quirk, R

1960 Towards a description of English usage. Transactions of the Philological Society: 40–61.

1992 On corpus principles and design. In J. Svartvik (ed.): 457–469.

Renouf, A

1993 A word in time: first findings from the investigation of dynamic text. In J. Aarts, P. De Haan & N. Oostdijk (eds.): 279–288.

Sampson, G

1994 SUSANNE: a Domesday Book of English grammar. In N. Oostdijk & P. De Haan (eds.): 169–187.

Souter, C

1989 A short handbook to the Polytechnic of Wales corpus. Norwegian Computing Centre for the Humanities.

Svartvik, J

(ed.) 1990 The London-Lund corpus of spoken English. Lund University Press.

(ed.) 1992 Directions in corpus linguistics. Mouton de Gruyter. BoP

Taylor, L., G. Leech & S. Fligelstone

1991 A survey of English machine-readable corpora. In S. Johansson & A-B. Stenström (eds.): 319–354.[See also: Statistics]