In:The Corpus Linguistics Discourse: In honour of Wolfgang Teubert
Edited by Anna Čermáková and Michaela Mahlberg
[Studies in Corpus Linguistics 87] 2018
► pp. 9–34
The (very) long history of corpora, concordances, collocations and all that
Published online: 6 December 2018
https://doi.org/10.1075/scl.87.02stu
https://doi.org/10.1075/scl.87.02stu
Abstract
In the development of academic disciplines, important ideas are often proposed, forgotten, and then rediscovered much later, when they are connected to other ideas in a way which reveals their significance. I give examples of ideas which are often thought of as quite modern, although they have a very long history:
–
using corpora in constructing dictionaries and language teaching materials
–
using concordances as data for textual exegesis and information retrieval
–
using collocations as evidence of word meaning.
In all three cases the theoretical significance of the ideas became clear only after improved techniques of visualisation allowed patterns to be seen in complex non-numerical data.
Article outline
- 1.Overview
- 2.Previous work
- 3.Concordancing content
- 4.Index, verbal concordance and “real” concordance
- 5.Concordancing form
- 6.Meaning and use
- 7.Practice and theory
- 8.Digital corpora
- 9.Collocation and phraseology
- 10.Meaningful quantification
- 11.KWIC (Key word in context) concordances
- 12.Concordance packages and programming languages
- 13.Conclusion
Acknowledgements Notes References
References (92)
Allén, S. et al. 1975. Nusvensk frekvensordbok baserad på tidningstext: Frequency dictionary of present-day Swedish, based on newspaper material. Vol 3. Ordfoerbindelser. Collocations. Stockholm: Almqvist & Wiksell.
Ayscough, S. 1790. An Index to the Remarkable Passages and Words Made Use of by Shakespeare; Calculated to Point out the Different Meanings to which the Words are Applied. London: Stockdale.
Barnbrook, G., Mason, O. & Krishnamurthy, R. 2013. Collocation: Applications and Implications. Houndmills: Palgrave Macmillan.
Berry-Rogghe, G. L. M. & Crawford, T. D. 1973. COCOA: A Word Count and Concordance Generator. Chilton: Atlas Computer Laboratory.
Brewer, C. 2011. Examining the OED. <[URL]> (17 September 2018).
Busa, R. 1974. Index Thomisticus: Sancti Thomae Aquinatis operum omnium indices et concordantiae … Stuttgart-Bad Cannstatt: Frommann-Holzboog.
1976. Guest editorial: Why can a computer do so little? Bulletin of the Association for Literary and Linguistic Computing 4(1): 1–3.
1992. Half a century of literary computing: Towards a “New” philology. Historical Social Research/Historische Sozialforschung 17, 2(62): 124–33.
(ed.). 1992. Thomae Aquinatis Opera Omnia cum Hypertextibus in CD-ROM. Milano: Editoria Elettronica Editel.
2004. Foreword: perspectives in the digital humanities. In A Companion to Digital Humanities, S. Schreibman, R. G. Siemens & J. Unsworth (eds), xvi–xxi. Oxford: Blackwell. <[URL]> (10 September 2018).
Crestadoro, A. 1856. The Art of Making Catalogues of Libraries: Or, a Method to Obtain in a Short Time a Most Perfect, Complete, and Satisfactory Printed Catalog of the British Museum Library by a Reader Therein. London: Literary, Scientific & Artistic Reference Office. <[URL]> (1 November 2013).
1864. Catalogue of the Books in the Manchester Free Library. Manchester Public Libraries (Manchester, England). London: Sampson, Low, Son, & Marston. <[URL]> (10 September 2018).
Crowley, T. 1989. The Politics of Discourse: The Standard Language Question in British Cultural Debates. Houndmills: Macmillan.
Cruden, A. 1737. A Complete Concordance to the Holy Scriptures of the Old and New Testament; Or a Dictionary and Alphabetical Index to the Bible … London: Frederick Warne & Co.
1741. A Verbal Index to Milton’s Paradise Lost. Adapted to Every Edition but the First, Which was Publish’d in Ten Books Only. London: W. Innys & D. Browne.
Eusebius. c. 320?. Epistula ad Carpianum ad canones evangeliorum praemissa. Greek with English translation.<[URL]> (17 September 2018).
Firth, J. R. 1957. A synopsis of linguistic theory, 1930-1955. Studies in Linguistic Analysis, 1-32. Philological Society.
Fischer, M. 1966. The KWIC index concept: A retrospective view. American Documentation, April, 57–70.
Francis, W. N. 1992. Language corpora BC. In Directions in Corpus Linguistics: Proceedings of Nobel Symposium 82, Stockholm, 4–8 August 1991, J. Svartvik (ed.). Berlin: De Gruyter.
Francis, W. N. & Kučera, H. 1964. Brown Corpus Manual. Manual of Information to Accompany a Standard Corpus of Present-Day Edited American English, for Use with Digital Computers. Providence RI: Brown University. <[URL]> (17 September 2017).
Fraser, M. 1996. A Hypertextual History of Humanities Computing: The Pioneers. <[URL]> (17 September 2018).
Giegerich, H. 2005. Obituary for Angus McIntosh (1914–2005). <[URL]> (17 September 2018).
Gougenheim, G., Michea, R., Rivenc, P. & Sauvageot, A. 1956. L’élaboration du français élémentaire: Étude sur l’établissement d’un vocabulaire et d’une grammaire de base. Paris: Didier. (Revised ed. retitled L’élaboration du français fondamental, 1964, Paris: Didier).
Griswold, R. E., Poage, J. F. & Polonsky, I. P. 1968. The SNOBOL4 Programming Language. Englewood Cliffs NJ: Prentice Hall.
2004. The history of humanities computing. In A Companion to Digital Humanities, S. Schreibman, R. G. Siemens & J. Unsworth (eds), Oxford: Blackwell. <[URL]> (17 September 2018).
Hollerith, H. 1894. The electrical tabulating machine. Journal of the Royal Statistical Society 57 (4): 678–89.
1898. Art of compiling statistics. No. 395,782. United States Patent Office. <[URL]> (1 November 2013).
Howatt, A. P. R. 2004. A History of English Language Teaching, 2nd ed. with H.G. Widdowson. Oxford: OUP.
Hüllen, W. 1996. Schemata der Historiographie. Ein Traktat. Beiträge zur Geschichte der Sprachwissenschaft 6(1): 113–125. Also in M. Isermann (ed.). 2002. Werner Hüllen: Collected Papers on the History of Linguistic Ideas, 16–28. Münster: Nodus.
Johansson, S. 2008. Some aspects of the development of corpus linguistics in the 1970s and 1980s. In Corpus Linguistics: An International Handbook, Vol. 1, A. Lüdeling & M. Kytö (eds), 33–52. Berlin: De Gruyter.
Johnson, S. 1747. The plan of an English Dictionary, J. Lynch (ed.). <[URL]> (17 September 2018).
1755. A Dictionary of the English Language: In Which the Words are Deduced from Their Originals, and Illustrated in Their Different Significations by Examples from the Best Writers … London: Knapton.
Kaeding, F. W. 1897. Häufigkeitswörterbuch der deutschen Sprache: Festgestellt durch einen Arbeitsausschuss der deutschen Stenographie-Systeme. Steglitz bei Berlin.
Keay, J. 2004. Alexander the Corrector: The Tormented Genius who Unwrote the Bible. London: HarperCollins.
Kuhn, T. S. 1969. Comment on the relations of science and art. Comparative Studies in Society and History 11: 403–412.
Leech, G. 1991. The state of the art in corpus linguistics. In English Corpus Linguistics, K. Aijmer & B. Altenberg (eds), 8–30. London: Longman.
2013. The development of ICAME and the Brown family of corpora. In The Many Facets of Corpus Linguistics in Bergen: In Honour of Knut Hofland [Bergen Language and Linguistics Studies 3. 1], L. Hareide, C. Johansson & M. Oakes (eds). <[URL]> (17 September 2018).
Léon, J. 2005. Claimed and unclaimed sources of corpus linguistics. Henry Sweet Society Bulletin 44: 36–50.
Liberman, M. 2004. A brief and a compendious table. Language Log, 4.March 4 2004. <[URL]> (17 September 2018).
Losee, R. M. 2001. Term dependence: A basis for Luhn and Zipf models. Journal of the American Society for Information Science and Technology 52(12): 1019–1025.
Luhn, H. P. 1958. The automatic creation of literature abstracts. IBM Journal of Research and Development 2(2): 159–165.
Meehan, B. 1994. The Book of Kells: An Illustrated Introduction to the Manuscript in Trinity College, Dublin. London: Thames & Hudson.
Meyer, C. F. 2008. Pre-electronic corpora. In Corpus Linguistics: An International Handbook, Vol. 1, A. Lüdeling & M. Kytö (eds), 1–13. Berlin: De Gruyter.
Moon, R. 2007. Sinclair, lexicography, and the Cobuild Project: The application of theory. International Journal of Corpus Linguistics 12(2): 159–181.
Mugglestone, L. C. 2005. Lost for Words: The Hidden History of the Oxford English Dictionary. New Haven CT: Yale University Press.
Murray, K. M. E. 1977. Caught in the Web of Words: James A H Murray and the Oxford English Dictionary. New Haven CT: Yale University Press.
Oliver, H. H. 1959. The epistle of Eusebius to Carpianus: Textual tradition and translation. Novum Testamentum 3(1–2): 138–145.
Palmer, H. E. 1933. Second Interim Report on English Collocations (submitted to the Tenth Annual Conference of English Teachers, Institute for Research in English Teaching, Dept. of Education, Tokyo). Tokyo: Kaitakusha.
Porzig, W. 1934. Wesenhafte Bedeutungsbeziehungen. Beiträge zur Geschichte der deutschen Sprache und Literatur 58: 70–97.
Ramsay, S. 2008. Algorithmic criticism. In A Companion to Digital Humanities, S. Schreibman, R. G. Siemens & J. Unsworth (eds). Oxford: Blackwell. <[URL]> (17 September 2018).
Rastall, P. 2001. Richard Chevenix Trench: More than just a populariser. The Henry Sweet Society Bulletin 37: 22–39.
Reed, A. 1977. CLOC: A collocation package. Association for Literary and Linguistic Computing Bulletin 5(2): 168–173.
Renouf, A. 2007. Corpus development 25 years on: From super-corpus to cyber-corpus. In Corpus Linguistics 25 Years On, R. Facchinetti (ed.), 127–149. Amsterdam: Rodopi.
Renouf, A. & Sinclair, J. 1991. Collocational frameworks in English. In English Corpus Linguistics, K. Aijmer & B. Altenberg (eds), 128–43. London: Longman.
Scott, M. & Tribble, C. 2006. Textual Patterns [Studies in Corpus Linguistics 22]. Amsterdam: John Benjamins.
Sellar, W. C. & Yeatman, R. J. 1931. 1066 and All That; A Memorable History of England, Comprising All the Parts You Can Remember Including One Hundred and Three Good Things, Five Bad Kings and Two Genuine Dates. London: Methuen.
2004. Interview with John Sinclair conducted by Wolfgang Teubert. In The OSTI Report, R. Krishnamurty (ed.), xvii–xxix. London: Continuum.
(ed.). 1987. Looking Up. An Account of the COBUILD Project in Lexical Computing and the Development of the Collins COBUILD English Language Dictionary. London: Collins ELT.
Sinclair, J. McH., Jones, S. & Daley, R. 1970[2004]. English Collocation Studies. Original mimeoed report 1970. Re-published as Krishnamurthy, R. (ed.). 2004. English Collocation Studies: The OSTI Report. London: Continuum.
Soy, S. K. 1998. Class notes: H. P. Luhn and automatic indexing. <[URL]> (17 September 2018).
Sperberg-McQueen, C. M. & Burnard, L. (eds). 1990. Guidelines for the Encoding and Interchange of Machine-Readable Texts. TEI P1. Draft 1.1. Chicago-Oxford. Updates: <[URL]> (17 September 2018)
Stevens, M. E. 1965. Automatic Indexing: A State of the Art Report. <[URL]> (17 September 2018).
Svartvik, J. 2007. Corpus linguistics 25+ years on. In Corpus Linguistics 25 Years On, R. Facchinetti (ed.), 11–25. Amsterdam: Rodopi.
Teubert, W. 2004. A brief history of corpus linguistics. In Lexicology and Corpus Linguistics, M. A. K. Halliday, W. Teubert, C. Yallop & A. Čermáková, 107–112. London: Continuum.
Thorndike, E. L. & Irving, Lorge. 1944. The Teacher’s Word Book of 30,000 Words. New York NY: Teachers College, Columbia University.
Tognini-Bonelli, E. 2001. Corpus Linguistics at Work [Studies in Corpus Linguistics 6]. Amsterdam: John Benjamins.
Trench, R. C. 1857. On Some Deficiencies in our English Dictionaries: Being the Substance of Two Papers Read Before the Philological Society, Nov. 5, and Nov. 19, 1857. Philological Society (Great Britain). London: J.W. Parker & Son.
Voloshinov, V. N. 1929[1973]. Marxism and the Philosophy of Language, transl. by L. Matejka & I. R. Titunik, first published in Russian 1929. New York NY: Seminar Press.
Weaver, W. 1955. Translation. In Machine Translation of Languages, W. N. Locke & D. A. Booth (eds), 15–23. Cambridge MA: The MIT Press. 1949 version at: <[URL]> (17 September 2018).
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
