In:Corpus Linguistics and African Englishes
Edited by Alexandra U. Esimaje, Ulrike Gut and Bassey E. Antia
[Studies in Corpus Linguistics 88] 2019
► pp. 7–36
Chapter 1.1What is corpus linguistics?
Published online: 13 February 2019
https://doi.org/10.1075/scl.88.02esi
https://doi.org/10.1075/scl.88.02esi
Abstract
Corpus Linguistics has made great strides in language research and teaching but it is only fairly known, and thus its potentials lost, to many African academics and linguistic communities. The aim of this chapter is to introduce corpus linguistics to those African researchers and others who are not yet familiar with, or have limited knowledge of, the field and who are interested in using this method for linguistic analysis. The chapter introduces the concept of corpus linguistics (Section 1), explains some of the key terms and concepts used in it (Section 2), and considers the types of corpora, as well as the scope and applications of corpus linguistics (Section 3).
Article outline
- 1.The concept of corpus linguistics
- Corpus: A definition
- Corpus linguistics: History and significance
- Languages
- Size
- Specialisation
- Individuals and software
- Statistics
- Corpus linguistics: Historical debates
- 2.Key concepts in corpus linguistics
- Corpus design
- Content
- Size
- Balance
- Representativeness
- Corpus output
- Corpus annotation
- Tagging
- Parsing
- Error tagging
- Semantic tagging
- 3.Types and applications of corpora
- Types of corpora
- General corpus
- Specialised corpus
- Comparable and parallel corpora
- Learner corpus
- Diachronic corpus
- Available corpora and software
- The scope and applications of corpus linguistics
- Conclusion
- Types of corpora
Notes References Appendix
References (70)
Adolphs, Svenja & Carter, Ronald. 2013. Spoken Corpus Linguistics: From Monomodal to Multimodal. London: Routledge.
Aston, Guy, Bernardini, Silvia & Stewart, Dominic. 2004. Introduction: Ten years of TALC. In Corpora and Language Learners [Studies in Corpus Linguistics 17], Guy Aston, Silvia Bernadini & Dominic Stewart (eds), 1–18. Amsterdam: John Benjamins.
Atkins, Sarah, Roberts, Celia, Hawthorne, Kamila & Greenhaigh, Trisha. 2016. Simulated consultations: A sociolinguistic perspective. BMC Medical Education 16(16): 1–9.
Baker, Paul. 2004. Querying keywords: Questions of difference, frequency and sense in keywords analysis. Journal of English Linguistics 32(4): 346–359.
. 2006. University Language: A Corpus-based Study of Spoken and Written Registers [Studies in Corpus Linguistics 23]. Amsterdam: John Benjamins.
. 2009. A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics 14(3): 275–311.
Biber, Douglas, Johansson, Stig, Leech, Geoffrey, Conrad, Susan & Finegan, Edward. 1999. Longman Grammar of Spoken and Written English. London: Longman.
Broughton, Geoffrey, Brumfit, Christopher, Flavell, Roger, Hill, Peter & Pincas, Anita. 2003. Teaching English as a Foreign Language, 2nd edn. London: Routledge.
Carter, Elisabeth. 2009. Policing talk: An investigation into the interaction of the officer and the suspect in the police interview. International Journal of Speech Language and the Law 16(1): 165–168.
Charles, Maggie. 2017. Do-it-yourself corpora in the EAP classroom: Views of students and teachers. In Faces of English Education: Students, Teachers, and Pedagogy, Lillian Wong & Kenneth Hyland (eds), 107–123. London: Routledge.
Davies, Mark. 2004. Student use of large, annotated corpora to analyze syntactic variation. In Corpora and Language Learners [Studies in Corpus Linguistics 17], Guy Aston, Silvia Bernadini & Dominic Stewart (eds), 259–269. Amsterdam: John Benjamins.
Durrant, Philip. 2014. Discipline and level specificity in university students’ written vocabulary. Applied Linguistics 35(3): 328–356.
Esimaje, Alexandra. 2012. A Lexical Study of the Sermons of Pastor Chris Oyakhilome. PhD dissertation, University of Maiduguri.
Francis, W. Nelson. 1982. Problems of assembling and computerizing large corpora. In Computer Corpora in English Language Research, Stig Johansson (ed.). Bergen: Norwegian Computing Centre for the Humanities.
Gabrielatos, Costas & Baker, Paul. 2008. Fleeing, sneaking, flooding: A corpus analysis of discursive constructions of refugees and asylum seekers in the UK press, 1996- 2005. Journal of English Linguistics 36(5): 5–38.
Garside, Roger & Smith, Nicholas. 1997. A hybrid grammatical tagger: CLAWS4. In Corpus Annotation: Linguistic Information from Computer Text Corpora, Roger Garside, Geoffrey Leech & Anthony McEnery (eds), 102–121. London: Longman.
Garside, Roger, Leech, Geoffrey & McEnery, Anthony (eds). 1997. Corpus Annotation: Linguistic Information from Computer Text Corpora. London: Longman.
Goldberg, Adele. 2006. Constructions at Work: The Nature of Generalization in Language. Oxford: OUP.
Granger, Sylviane & Tribble, Christopher. 1998. Learner corpus data in the classroom: Form-focused instruction and data-driven learning. In Learner English on Computer, Sylviane Granger (ed.), 199–209. London: Longman.
Gray, Bethany. 2015. Linguistic Variation in Research Articles: When Discipline Tells only Part of the Story [Studies in Corpus Linguistics 71]. Amsterdam: John Benjamins.
Gries, Stefan T. 2006. Exploring variability within and between corpora: Some methodological considerations. Corpora 1(2): 109–151.
2010. Useful statistics for corpus linguistics. In A Mosaic of Corpus Linguistics: Selected Approaches, Aquilino Sanchez & Moises Almela (eds), 269–291. Frankfurt: Peter Lang.
Halliday, Michael. 1985/1994. An Introduction to Functional Grammar, 1st/2nd edn. London: Edward Arnold.
Hardie, Andrew. 2017. Exploratory analysis of word frequencies across corpus texts: Towards a critical contrast of approaches. Paper given at Corpus Linguistics 17 conference, University of Birmingham, July 2017.
Hunston, Susan & Francis, Gill. 1999. Pattern Grammar: A Corpus-driven Approach to the Lexical Grammar of English [Studies in Corpus Linguistics 4]. Amsterdam: John Benjamins.
Hyland, Kenneth. 2012. Disciplinary Identities: Individuality and Community in Academic Discourse. Cambridge: CUP.
Johnston, Trevor & Schembri, Adam. 2006. Issues in the creation of a digital archive of a signed language. In Sustainable Data from Digital Fieldwork, Linda Barwick & Nicholas Thieberger (eds), 7–16. Sydney: University of Sydney Press.
Koteyko, Nelya. 2007. A diachronic approach to meaning: English loanwords in Russian opposition discourse. Corpora 2(1): 65–95.
Kruger, Alet, Wallmach, Kim & Munday, Jeremy (eds). 2011. Corpus-based Translation Studies: Research and Applications. London: Continuum.
Leech, Geoffrey. 1997. Introducing corpus annotation. In Corpus Annotation: Linguistic Information from Computer Text Corpora, Roger Garside, Geoffrey Leech & Anthony McEnery (eds), 1–19. London: Longman.
Leivada, Evelina, Papadopoulou, Elena & Pavlou, Natalia. 2017. Functionally equivalent variants in a non-standard variety and their implications for Universal Grammar: A spontaneous speech corpus. Frontiers of Psychology 8: 1260.
Leńko-Szymańska, Agnieszka. 2004. Demonstratives as anaphora markers in advanced learners English. In Corpora and Language Learners [Studies in Corpus Linguistics 17], Guy Aston, Silvia Bernardini & Dominic Stewart (eds), 89–107. Amsterdam: John Benjamins.
Mahlberg, Michaela. 2005. English General Nouns: A Corpus Theoretical Approach [Studies in Corpus Linguistics 20]. Amsterdam: John Benjamins.
Mahlberg, Michaela, Smith, Catherine & Preston, Simon. 2013. Phrases in literary contexts: Patterns and distributions of suspensions in Dickens’s novels. International Journal of Corpus Linguistics 18(1): 35–56.
Mahlberg, Michaela, Stockwell, Peter, de Joode, Johan, Smith, Catherine & O’Donnell, Matthew. 2016. CLiC Dickens: Novel uses of concordances for the integration of corpus stylistics and cognitive poetics. Corpora 11(3): 433–463.
Mauranen, Anna. 2012. Exploring ELF: Academic English Shaped by Non-native Speakers. Cambridge: CUP.
McEnery, Tony, Xiao, Richard & Tono, Yukio. 2006. Corpus-based Language Studies: An Advanced Resource Book. New York NY: Routledge.
McEnery, Tony & Wilson, Andrew. 1996. Corpus Linguistics: An Introduction. Edinburgh: Edinburgh University Press.
Murison-Bowie, Simon. 1996. Linguistic corpora and language teaching. Annual Review of Applied Linguistics 16: 182–199.
Nelson, Michael. 2000. A Corpus Based Study of Business English and Business English Teaching Materials. PhD dissertation, University of Manchester.
Partington, Alan, Duguid, Alison & Taylor, Charlotte. 2013. Patterns and Meanings in Discourse: Theory and Practice in Corpus-assisted Discourse Studies (CADS) [Studies in Corpus Linguistics 55]. Amsterdam: John Benjamins.
Rayson, Paul. 2008. From key words to key semantic domains. International Journal of Corpus Linguistics 13(4): 519–549.
Scott, Michael & Tribble, Christopher. 2006. Textual Patterns: Key Words and Corpus Analysis in Language Education [Studies in Corpus Linguistics 22]. Amsterdam: John Benjamins.
Sripicharn, Passapong. 2004. Examining native speakers’ and learners’ investigation of the same concordance data and its implications for classroom concordancing with EFL learners. In Corpora and Language Learners [Studies in Corpus Linguistics 17], Guy Aston, Silvia Bernadini & Dominic Stewart (eds), 233–245. Amsterdam: John Benjamins.
Starcke, Bettina. 2006. The phraseology of Jane Austen’s Persuasion: Phraseological units as carriers of meaning. ICAME Journal 30: 87–102.
Su, Hang. 2015. Judgement and Adjective Complementation Patterns in Biographical Discourse: A Corpus Study. PhD dissertation, University of Birmingham.
Teubert, Wolfgang. 2005. My version of corpus linguistics. International Journal of Corpus Linguistics 10(1): 1–13.
Thompson, Geoffrey & Hunston, Susan (eds). 2006. System and Corpus: Exploring Connections. London: Equinox.
Thompson, Paul, Hunston, Susan, Murakami, Akira & Vajn, Dominik. 2017. Multi-dimensional analysis, text constellations, and interdisciplinary discourse. International Journal of Corpus Linguistics 22(2): 153–186.
Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work [Studies in Corpus Linguistics 6]. Amsterdam: John Benjamins.
Cited by (4)
Cited by four other publications
Botha, Werner, Bertus van Rooy & Susan Coetzee‐van Rooy
Lima-Lopes, Rodrigo Esteves, Karen Tank Mercuri & Maristella Gabardo
Esimaje, Alexandra U.
2019. The purpose, design and use of the Corpus of Nigerian and Cameroonian English Learner Language (Conacell). In Corpus Linguistics and African Englishes [Studies in Corpus Linguistics, 88], ► pp. 71 ff.
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
