In:Applications of Pattern-driven Methods in Corpus Linguistics:
Edited by Joanna Kopaczyk and Jukka Tyrkkö
[Studies in Corpus Linguistics 82] 2018
► pp. 81–104
Chapter 4Lexical obsolescence and loss in English: 1700–2000
Published online: 13 March 2018
https://doi.org/10.1075/scl.82.04tic
https://doi.org/10.1075/scl.82.04tic
Abstract
This paper explores a new methodology for extracting forms that were once common but are now obsolete, from large corpora. It proceeds from the relatively under-researched problem of lexical mortality, or obsolescence in general, to the formulation of two closely related procedures for querying the n-gram data of the Google Books project in order to identify the best word and lexical expression candidates that may have become lost or obsolete in the course of the last three centuries, from the Late Modern era to Present-day English (1700–2000). After describing the techniques used to process big uni- and trigram data, this chapter offers a selective analysis of the results and proposes ways the methodology may be of help to corpus linguists as well as historical lexicographers.
Article outline
- 1.Introduction
- 1.1Research questions
- 1.2Theoretical problems and practical definitions
- 2.The corpus and its problems
- 2.1The n-grams
-
3.Methodology
- 3.1Data requirements
- 3.2Word obsolescence
- 3.3Pruning and sorting the results
- a.Proper names
- b.OCR errors
- c.Variety-specific forms
- 3.4Obsolescence of multi-word expressions
- 3.5Technicalities
- 4.Analysis and discussion of the results
- 4.1Unigrams
- 4.2Trigrams
- 4.3Future research
- 5.Conclusions
Acknowledgements Notes References
References (17)
Aitchison, Jean. 1987. Words in the Mind: An Introduction to the Mental Lexicon. New York NY: Blackwell.
Bouma, Gerlof. 2009. Normalized (pointwise) mutual information in collocation extraction. Proceedings of GSCL Conference, 31–40.
Coleman, Robert. 1990. The assessment of lexical mortality and replacement between old and modern English. In Papers from the 5th International Conference on English Historical Linguistics [Current Issues in Linguistic Theory 65], Sylvia M. Adamson, Vivien A. Law, Nigel Vincent & Susan Wright (eds), 69–86. Amsterdam: John Benjamins.
Čermák, Jan. 2008. Ælfric’s homilies and incipient typological change in the 12th century English word-formation. Acta Universitatis Philologica: Prague Studies in English XXV(1): 109–115.
Davies, Mark. 2012. Google Books corpus. Google Books Corpus. <[URL]> (1 February 2016).
Google Books History. 2009. <[URL]> (29 November 2015).
Gries, Stefan T. 2008. Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics 13(4): 403–437. .
Hales, Steven D. 2005. Thinking tools: You can prove a negative. Think 4(10): 109–112. . <[URL]> (30 January 2016).
Kilgarriff, Adam. 2015. How many words are there? The Oxford Handbook of The Word, John R. Taylor (ed.), 29–37. Oxford: OUP.
Michel, Jean-Baptiste, Shen, Yuan Kui, Aiden, Aviva Presser, Veres, Adrian, Gray, Matthew K., Pickett, Joseph P., Hoiberg, Dale et al. 2011. Quantitative analysis of culture using millions of digitized books. Science 331(6014): 176–182. . <[URL]> (29 November 2015)
Milton, James & Donzelli, Giovanna. 2013. The lexicon. In The Cambridge Handbook of Second Language Acquisition, Julia Herschensohn & Martha Young-Scholten (eds), 441–460. Cambridge: CUP.
OED Online: Key to frequency. 2015. OED Online. <[URL]> (4 February 2016)
Petersen, Alexander M., Tenenbaum, Joel, Havlin, Shlomo & Stanley, H. Eugene. 2012. Statistical laws governing fluctuations in word use from word birth to word death. Scientific Reports 2. . <[URL]> (29 November 2015).
TestYourVocab.com. <[URL]> (29 November 2015).
Cited by (14)
Cited by 14 other publications
Mansfield, John
Vogelsanger, Johanna
von der Fecht-Fernández, Sara
Smith, Chris A.
Drury, Brett & Samuel Morais Drury
Cunha, Evandro L.T.P. & Søren Wichmann
Francis, David, Ella Rabinovich, Farhan Samir, David Mortensen & Suzanne Stevenson
Kranich, Svenja & Tine Breban
Rudnicka, Karolina
Säily, Tanja & Jukka Tyrkkö
Tichý, Ondřej
2021. Corpus driven identification of lexical bundle obsolescence in Late Modern English. In Lost in Change [Studies in Language Companion Series, 218], ► pp. 101 ff.
Tyrkkö, Jukka
2019. Kinship references in the British Parliament, 1800–2005. In Reference and Identity in Public Discourses [Pragmatics & Beyond New Series, 306], ► pp. 97 ff.
[no author supplied]
[no author supplied]
2021. Corpus driven identification of lexical bundle obsolescence in Late Modern English. In Lost in Change [Studies in Language Companion Series, 218],
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
