In:The Corpus Linguistics Discourse: In honour of Wolfgang Teubert
Edited by Anna Čermáková and Michaela Mahlberg
[Studies in Corpus Linguistics 87] 2018
► pp. 35–75
Modes of analysis
An autoergography of corpus linguistics from lexis to discourse
Published online: 6 December 2018
https://doi.org/10.1075/scl.87.03kri
https://doi.org/10.1075/scl.87.03kri
Abstract
Throughout history, the study of language has involved looking for units, their patterns of combination, the functions they serve, and the processes involved; and construing the relationships between these features and the meanings which arise from them. Developments in technology inevitably revolutionise ideas, innovate methods, and transform the fields in which they are implemented. So, just as the invention and development of writing systems, and the advent of printing, did in the distant past, computers have had a similarly revolutionary effect on linguistics in recent decades. This paper looks at some of the ways in which corpus linguistics has used the latest technologies to embark on a substantial re-investigation and re-appraisal of the elements of language, their roles within the language system, and their relationships within the wider social context of human beings, their environments, and their activities. The structuring of the paper owes much to recent retrospection, but the pieces of research consist mainly of my early explorations of corpus linguistics, hence ‘auto (self) + ergo (work) + graphy (description)’ in the title.
Article outline
- 1.Introduction
- 2.Ancient and modern linguistics
- 3.Corpus linguistics
- 4.Character mode
- 5.Morpheme mode
- 6.Word mode
- 7.Lemma mode
- 8.Concordance mode
- 9.Collocation mode
- 9.1Adjacent collocations: N-grams
- 9.2Non-adjacent collocations: Span and statistics
- 9.3Collocation and phraseology
- 9.4Collocation and grammar
- 9.5Collocation and evaluation: Semantic prosody
- 10.Text and corpus mode
- 11.Discourse mode
- 12.Conclusions
Notes References
References (154)
Abramova, E., Fernández, R. & Sangati, F. 2013. Automatic labeling of phonesthemic senses. In Proceedings of the 35th Annual Conference of the Cognitive Science Society, M. Knauff, M. Pauen, N. Sebanz & I. Wachsmuth (eds), 1696–1701. Austin TX: Cognitive Science Society. <[URL]> (1 July 2017).
Abuczki, A. & Ghazaleh, E. B. 2013. An overview of multimodal corpora, annotation tools and schemes. Argumentum 9: 86–98. <[URL]> (1 July 2017))
Alba-Salas, J. 2007. On the life and death of a collocation. A corpus-based diachronic study of dar miedo/hacer miedo-type structures in Spanish. Diachronica 24(2): 207–252.
Allwood, J. 2008. Multimodal corpora. In Corpus Linguistics. An International Handbook, A. Lüdeling & M. Kytö (eds), 207–225. Berlin: Mouton de Gruyter.
Atkinson, Q. D. 2011. Phonemic diversity supports a serial founder effect model of language expansion from Africa. Science 332(6027): 346–349.
Baker, M. 1993. Corpus linguistics and translation studies: Implications and applications. In Text and Technology: In Honour of John Sinclair, M. Baker, G. Francis & E. Tognini-Bonelli (eds), 233–250. Amsterdam: John Benjamins.
Baker, P. & McEnery, A. 2015. Corpora and Discourse Studies: Integrating Discourse and Corpora. Houndmills: Palgrave Macmillan.
Barnbrook, G., Mason, O. & Krishnamurthy, R. 2013. Collocation: Applications and Implications. Houndmills: Palgrave Macmillan.
Baroni, M. 2003. Distribution-driven morpheme discovery: A computational/experimental study. In Yearbook of Morphology, G. Booij & J. van Marle (eds), 213–228. Dordrecht: Springer.
Baroni, M. & Lenci, A. 2010. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics 36(4): 673–721.
Barton, D. & Lee, C. 2013. Language Online: Investigating Digital Texts and Practices. London: Routledge.
Bateman, J. A. 2012. Multimodal corpus-based approaches. In The Encyclopedia of Applied Linguistics, C. A. Chapelle (ed.), 3983–3991. Oxford: Wiley-Blackwell.
Bergen, B. K. 2004. The psychological reality of phonaesthemes. Language 80: 290–311. <[URL]> (1 July 2017).
Biber, D. 2009. A corpus-driven approach to formulaic language in English: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics 14(3): 275–311.
Bullon, S. & Lane, T. 1991. The main points of the news: World service data for a world service dictionary. Listed but unpublished in Using Corpora, Proceedings of the 7th Annual Conference. Waterloo: UW Centre for the New OED and Text Research/Oxford: OUP.
Caldas-Coulthard, C. R. & Coulthard, M. (eds). 1996. Readings in Critical Discourse Analysis. London: Routledge.
Carstens, W. A. M. 1999[2001]. Text linguistics: Relevant linguistics? In Poetics, Linguistics and History: Discourses of War and Conflict. PALA Conference Papers 1999, I. Bierman & A. L. Combrink (eds), 588–595. Potchefstroom: Potchefstroom University. <[URL]> (1 July 2017).
Celce-Murcia, M. 2002. Why it makes sense to teach grammar in context and through discourse. In New Perspectives on Grammar Teaching in Second Language Classrooms, E. Hinkel & S. Fotos (eds), 121–136. Mahwah NJ: Lawrence Erlbaum Associates.
Chomsky, N. 1968. Language and Mind. New York NY: Harcourt Brace Jovanovich. <[URL]> (1 July 2017). (One of the six lectures is reproduced here; transcribed in 1998 by A. Blunden; proofed and corrected February 2005).
Clear, J. 1995. Corpora: T-score in collocational analysis. Posted to Corpora-List on 12 December 1999. <[URL]> (the link no longer valid).
Clear, J., Fox, G. L., Francis, G., Krishnamurthy, R. & Moon, R. 1996. Cobuild: The state of the art. International Journal of Corpus Linguistics 1(2): 303–314.
Cowan, N., Morey, C. C. & Chen, Z. 2007. The legend of the magical number seven. In Tall Tales about the Mind & Brain: Separating Facts from Fiction, S. Della Sala (ed.), 45–59. Oxford: OUP.
Cruden, A. 1769. A Complete Concordance to the Old and New Testament (1891 ed.). London: Frederick Warne and Co.
De Beaugrande, R. 1991. Linguistic Theory: The Discourse of Fundamental Work, Section 8: John Rupert Firth. <[URL]> (1 July 2017).
De Borba, M. C. S. 1997. Two Brazilian-Portuguese translations of wordplay in Alice’s Adventures In Wonderland. Cadernos de Tradução 2: 115–126. Florianópolis: Universidade Federal de Santa Catarina.
1999. Text diversity, intertextuality and parodies in Wonderland. Fragmentos 16: 15–22. Florianópolis: Universidade Federal de Santa Catarina.
Deignan, A. 2005. Metaphor and Corpus Linguistics [Converging Evidence in Language and Communication Research 6]. Amsterdam: John Benjamins.
Deignan, A. & Potter, L. 2004. A corpus study of metaphors and metonyms in English and Italian. Journal of Pragmatics 36: 1231–1252.
1957b[1951]. Modes of Meaning. In Papers in Linguistics 1934–1951, J. R. Firth, 190–215. London: OUP.
Fowler, H. N. 1921. Volume 12 of Plato in Twelve Volumes. Cambridge MA: Harvard University Press. <[URL]> (1 July 2017).
Fowler, R. 1991. Language in the News. Discourse and Ideology in the Press. London: Routledge and Kegan Paul.
1996. On critical linguistics. In Texts and Practices: Readings in Critical Discourse Analysis, C. R. Caldas-Coulthard & M. Coulthard (eds), 3–14. London: Routledge.
Foster, M. E. & Oberlander, J. 2007. Corpus-based generation of head and eyebrow motion for an embodied conversational agent. Language Resources and Evaluation 41(3–4): 305–323.
Francis, G., Hunston, S. & Manning, E. 1996. Collins COBUILD Grammar Patterns, 1: Verbs. London: HarperCollins.
Francis, W. N. & Kucera, H. 1979. Brown Corpus Manual. Manual of information to accompany A Standard Corpus of Present-Day Edited American English, for use with digital computers. Providence RI: Brown University. <[URL]> (17 September 2018).
Godart-Wendling, B. 2014. L’hypothèse de Firth: Wittgenstein, héritier de Malinowski? Historiographia Linguistica 41(1): 79–108. <> (1 July 2017).
Grundmann, R. & Krishnamurthy, R. 2010. The discourse of climate change: A corpus-based approach. Critical Approaches to Discourse Analysis Across Disciplines (CADAAD) Journal 4(2): 125–146.
1966. Lexis as a linguistic level. In In Memory of J.R. Firth, C. E. Bazell, J. C. Catford, M. A. K. Halliday & R. H. Robins (eds), 150–161. London: Longman.
Ho, Y. 2011. Corpus Stylistics in Principles and Practice: A Stylistic Exploration of John Fowles’ The Magus. London: Bloomsbury.
Hunston, S. & Francis, G. 2002. Pattern Grammar [Studies in Corpus Linguistics 4]. Amsterdam: John Benjamins.
Jaworska, S. & Krishnamurthy, R. 2012. On the F-word: A corpus-based analysis of the media representation of feminism in English and German newspapers, 1990–2009. Discourse & Society 23(4): 401–431.
Kehoe, A. & Gee, M. 2009. Weaving web data into a diachronic corpus patchwork. In Corpus Linguistics: Refinements and Reassessments, A. Renouf & A. Kehoe (eds), 255–279. Amsterdam: Rodopi.
Knight, D. 2011. The future of multimodal corpora. Revista Brasileira de Linguistica Aplicada 11(2): 391–415.
Krishnamurthy, R. 1987. The process of compilation. In Looking Up: An Account of the COBUILD Project in Lexical Computing, J. M. Sinclair (ed.), 62–85. London: Collins ELT.
1992a. Introductory Workshops on Dictionaries. <[URL]> (1 July 2017).
1992b. Data collection. NERC-WP6/WP7-57, Working Paper for EC Project: Network of European Reference Corpora. Pisa: ILC.
1995. The macrocosm and the microcosm: The corpus and the text. In Linguistic Approaches to Literature: Papers in Literary Stylistics, J. Payne (ed.), 1–17. Birmingham: University of Birmingham.
(1996a). Ethnic, racial and tribal: The language of racism? In Texts and Practices: Readings in Critical Discourse Analysis, C. R. Caldas-Coulthard & M. Coulthard (eds), 129–149. London: Routledge. (Reprinted in Teubert, W. & Krishnamurthy, R. (eds). 2007. Corpus Linguistics: Critical Concepts in Linguistics, 179–200. London: Routledge).
1996b. The data is the dictionary: Corpus at the cutting edge of lexicography. In Papers in Computational Lexicography, COMPLEX’96, F. Kiefer, G. Kiss & J. Pajzs (eds), 117–144. Budapest: Hungarian Academy of Sciences.
2000. Collocation: From silly ass to lexical sets. In Words in Context: A Tribute to John Sinclair on his Retirement, C. Heffer & H. Sauntson (eds). Birmingham: University of Birmingham.
2001a. Language corpora: How can teachers and students use these valuable new resources? In Selected Papers from the 10th International Symposium on English Teaching, 59–65. Taipei: ETA/ROC.
2001b. Learning and teaching through context – a data-driven approach. TESOL Spain Newsletter 24, 9–10. <[URL]> (1 July 2017).
2001c. The science and technology of corpus, and corpus for science and technology. In La Investigacion en Lenguas Aplicadas: Enfoque Multidisciplinar, G. A. De Cea & P. D. Escribano (eds), 79–114. Madrid: Fundacion Gomez Pardo & Universidad Politecnica de Madrid.
2002a. The corpus revolution in EFL dictionaries and Appendix: Analysis of sexy in the 450-million-word Bank of English corpus. Kernerman Dictionary News 10. <[URL]> (1 July 2017).
2002b. The Bank of English past, present, and future: Corpus size, composition, annotation, and software. Presented at the 2nd ILASH Half-Day Workshop on Computational Language Resources, University of Sheffield. <[URL]> (1 July 2017).
2002c. Pragmatics and the EFL Dictionary. Presentation at the 22nd ThaiTESOL conference, ‘Inspiring Change In ELT’, in Chiangmai. <[URL]> (1 July 2017).
2003. Freeze-frame pictures: micro-diachronic variations in synchronic corpora. In Studies in English Theoretical and Applied Linguistics, J. Andor, J. Horvath & M. Nikolov (eds), 15–31. Pécsi Tudományegyetem: Lingua Franca Csoport.
2005a. Grammar and Lexis of English. MA module for Aston University. <[URL]> (1 July 2017).
2005b. Teaching and learning English metaphors. Presentation at JALT Conference, Shizuoka. <[URL]> (1 July 2017).
2008. ACORN in USE: CASE STUDIES. Talk given in research seminar at Aston University. <[URL]>
2013a. Corpus linguistics: From lexis to discourse. Presentation at Hildesheim University, Germany<[URL]> (1 July 2017).
2013b. 2013-Corpus Workshop III: Introduction to WordSmith Tools 6 [version 2: Using UK news articles on “UK Riots” as data]. Presentation given at ‘Towards Operationalizing Corpus Development Plan’, University of KwaZulu-Natal, South Africa. <[URL]> (1 July 2017).
Forthcoming. Collocations and lexicography: Sinclairian theory in practice. In International Handbook of Lexis and Lexicography, Section 2: Lexical Theory and Lexicography, P. Hanks & G-M. de Schryver (eds). Heidelberg: Springer.
Larrivée, P. & Krishnamurthy, R. 2009. La créativité et la conventionnalité de groupes nominaux atypiques déterminant + pronom indéfini et leurs contextes communicatifs. La langue en contexte. Actes du colloque «Représentations du sens linguistique IV». Société Néophilologique de Helsinki, 93–106.
Li, C. 2010. Unity and Variety: A Study of the Chinese Language and Its Cultural Implications. PhD dissertation, UCSD. <[URL]> (1 July 2017))
Louw, B. 1993. Irony in the text or insincerity in the writer? The diagnostic potential of semantic prosodies. In Text and Technology: In Honour of John Sinclair, M. Baker, G. Francis & E. Tognini-Bonelli (eds), 157–176. Amsterdam: John Benjamins.
2000. Contextual prosodic theory: Bringing semantic prosodies to life. In Words In Context: A Tribute to John Sinclair on His Retirement, C. Heffer & H. Sauntson (eds), Birmingham: University of Birmingham. <[URL]> (1 July 2017).
Lukin, A. 2014. The study of “living language”: The SFL conception of text/context relations, 2. Firth: J.R. Firth and renewal of connection with the processes and patterns of life. <[URL]> (1 July 2017).
Macarro, A. S. & Peñuelas, A. B. C. (eds). 2014. New Insights into Gendered Discursive Practices: Language, Gender and Identity Construction. Valencia: Valencia University Press.
Malinowski, B. 1922[1978]. Argonauts of the Western Pacific: An Account of Native Enterprise and Adventure in the Archipelagoes of Melanesian New Guinea [Studies in Economics and Political Science 65]. London: Routledge and Kegan Paul. <[URL]> (1 July 2017).
Martin, D., Krishnamurthy, R., Bhardwaj, M. & Charles, R. 2003. Language change in young Panjabi/English children: Implications for bilingual language assessment. Child Language Teaching and Therapy 19(3): 245–265.
Matilal, B. K. 1990. The Word and The World. India’s Contribution to the Study of Language. Oxford: OUP.
Mautner, G. 2012. Corpora and critical discourse analysis. In Contemporary Corpus Linguistics, P. Baker (ed.), 32–46. London: A & C Black.
McKeon, R. 1946. Aristotle’s conception of language and the arts of language. Classical Philology 41(4): 193–206.
Miller, G. A. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63: 81–97.
Mindt, D. 1991. Syntactic evidence for semantic distinctions in English. In English Corpus Linguistics: Studies in Honour of Jan Svartvik, K. Aijmer & B. Altenberg (eds), 182–196. London: Longman.
Montemurro, M. A. 2001. Beyond the Zipf–Mandelbrot law in quantitative linguistics. Physica A 300: 567–578.
Moon, R. 1987. The analysis of meaning. In Looking Up. An Account of the COBUILD Project in Lexical Computing, J. Sinclair (ed.), 86–103. London: Collins ELT.
Müller, F. M. 1860. A History of Ancient Sanskrit Literature. <[URL]> (1 July 2017).
OALD. 2000. Oxford Advanced Learner’s Dictionary of Current English, 6th ed., A. S. Hornby & S. Wehmeier (eds). Oxford: OUP.
OALD4. 1989. Oxford Advanced Learner’s Dictionary of English, 4th ed., A. S. Hornby & A. P. Cowie (eds). Oxford: OUP.
Otis, K. & Sagi, E. 2008. Phonaesthemes: A corpora-based analysis. In Proceedings of the 30th Annual Conference of the Cognitive Science Society, B. C. Love, K. McRae & V. M. Sloutsky (eds), 65–70. Austin TX: Cognitive Science Society.
Piantadosi, S. T. 2014. Zipf’s word frequency law in natural language: A critical review and future directions. Psychonomic Bulletin & Review 21: 1112–1130.
Pustejovsky, J., Bergler, S. & Anick, P. 1993. Lexical semantic techniques for corpus analysis. Computational Linguistics 19(2): 331–358.
Richens, T. 2011. Lexical Database Enrichment Through Semi-Automated Morphological Analysis. PhD dissertation, Aston University. <[URL]> (1 July 2017).
Saferstein, B. 2004. Digital technology and methodological adaption: Text on video as a resource for analytical reflexivity. Journal of Applied Linguistics 1(2): 197–223.
Schmandt-Besserat, D. 1977. The earliest precursor of writing. Scientific American 238(6): 50–58. <[URL]> (1 July 2017).
1991. Two precursors of writing: plain and complex tokens. In The Origins of Writing, W. M. Senner (ed.), 27–41. Lincoln NE: University of Nebraska Press. <[URL]> (1 July 2017).
1966. Beginning the study of lexis. In In Memory of J.R. Firth, C. E. Bazell, J. C. Catford, M. A. K. Halliday & R. H. Robins (eds), 410–430. London: Longman.
(ed.). 1987b. Looking Up. An Account of the COBUILD Project in Lexical Computing. London: Collins ELT.
2007. Preface. International Journal of Corpus Linguistics 12(2): 155–157.
Sinclair J., Jones, S. & Daley, R. 1970[2004]. English Lexical Studies: Report to OSTI on Project C/LP/08. Re-published as Krishnamurthy, R. (ed.). 2004. English Collocation Studies: The OSTI Report. London: Continuum.
Stefanowitsch, A. & Gries, S. T. (eds). 2008. Corpus-Based Approaches to Metaphor and Metonymy. Berlin: De Gruyter Mouton.
Stubbs, M. 1995. Collocations and semantic profiles: On the cause of the trouble with quantitative studies. Functions of Language 2(1): 23–55. (Reprinted in Teubert, W. & Krishnamurthy, R. (eds). 2007. Corpus Linguistics: Critical Concepts in Linguistics, 166–193. London: Routledge).
Tagg, C. 2014. Translanguaging as an addressivity strategy for identity and relational work on Facebook. Talk given at ‘Superdiversity: Theory, Method and Practice in an Era of Change’ held by IRiS, University of Birmingham, 23–25 June. <[URL]> (1 July 2017).
Teubert, W. 2001. A province of a federal superstate ruled by an unelected bureaucracy: Keywords of the Euro-Sceptic discourse in Britain. In Attitudes Towards Europe, C. Good, A. Musolff, P. Points & R. Wittlinger (eds), 45–86. Abingdon: Ashgate. (Reprinted in Teubert, W. & Krishnamurthy, R. (eds). 2007. Corpus Linguistics: Critical Concepts in Linguistics, 142–178. London: Routledge).
2003. Writing, hermeneutics, and corpus linguistics. Logos and Language 42: 1–17. (Reprinted in Teubert, W. & Krishnamurthy, R. (eds). 2007. Corpus Linguistics: Critical Concepts in Linguistics, 134–159. London: Routledge).
2007. General introduction. In Corpus Linguistics: Critical Concepts in Linguistics, W. Teubert & R. Krishnamurthy (eds), 1–38. London: Routledge.
Tognini-Bonelli, E. 2001. Corpus Linguistics at Work [Studies in Corpus Linguistics 6]. Amsterdam: John Benjamins.
Toury, G. 1980. In Search of a Theory of Translation. Tel Aviv: The Porter Institute for Poetics and Semiotics, Tel Aviv University.
1995. Descriptive Translation Studies – and Beyond [Benjamins Translation Library 4]. Amsterdam: John Benjamins.
van Dijk, T. A. 1996. Discourse, power and access. In Texts and Practices: Readings in Critical Discourse Analysis, C. R. Caldas-Coulthard & M. Coulthard (eds), 84–106. London: Routledge.
van Dijk, T. 2001. Critical discourse analysis. In Handbook of Discourse Analysis, D. Tannen, D. Schiffrin & H. Hamilton (eds), 352–371. Oxford: Blackwell.
Vaughan, E. & Clancy, B. 2013. Small corpora and pragmatics. In Yearbook of Corpus Linguistics and Pragmatics, Vol. 1, J. Romero-Trillo (ed.), 53–73. Dordrecht: Springer.
Wang, Q. E. 2008. Beyond East and West: Antiquarianism, evidential learning, and global trends in historical study. Journal of World History 19(4): 489–519.
Weisser, M. 2016. Practical Corpus Linguistics: An Introduction to Corpus-Based Language Analysis. Hoboken NJ: Wiley-Blackwell.
Willis, D. 2003. Rules, Patterns and Words. Grammar and Lexis in English Language Teaching. Cambridge: CUP.
Xiao, R. & McEnery, T. 2006. Collocation, semantic prosody, and near synonymy: A cross-linguistic perspective. Applied Linguistics 27(1): 103–129.
Zethsen, K. K. 2006. Semantic prosody: Creating awareness about a versatile tool. Tidsskrift for Sprogforskning 4(1–2): 275–294.
Zhu, L. 2013. Historical Chinese phonology as a Meeting Ground for the Indian, the Chinese, and the Western Linguistic Tradition. <[URL]> (1 July 2017).
