In:Broadening the Spectrum of Corpus Linguistics: New approaches to variability and change
Edited by Susanne Flach and Martin Hilpert
[Studies in Corpus Linguistics 105] 2022
► pp. 285–318
A data-driven approach to finding significant changes in language use through time series analysis
Published online: 10 November 2022
https://doi.org/10.1075/scl.105.10keh
https://doi.org/10.1075/scl.105.10keh
Abstract
This paper conducts a diachronic study of language change in a corpus covering almost 30 years of mainstream UK news text. In our previous studies, several databases were compiled from the corpus, including diachronic records of word frequency, collocation and morphological analysis. Upon user enquiry, our WebCorp Linguist’s Search Engine produced tailored output from these resources. The system was therefore passive, requiring a word or phrase to be specified before querying the databases. The aim now is to extend the data-driven functionality to track the frequency of words in the corpus across time automatically and alert users to statistically significant change patterns. Three tests are employed to find upward and downward trends, sudden jumps in frequency, and seasonal variation.
Keywords: diachrony, language change, variation, statistics, methods
Article outline
- 1.Introduction
- 2.Data and methods
- 2.1Trends
- 2.2Seasonality
- 2.3Sudden jumps (for otherwise rare words)
- 2.4Seasonal and trend decomposition using Loess (STL)
- 3.Results and discussion
- 3.1Seasonality test
- 3.2Trend test
- 3.3Sudden jumps test
- 4.Conclusions and future work
Notes References
References (14)
Cleveland, Robert B., Cleveland, William. S., McRae, Jean E. & Terpenning, Irma. 1990. STL: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics 6(1): 3–33.
Cleveland, William S. 1981. LOWESS: A program for smoothing scatter plots by robust locally weighted regression. The American Statistician 35: 54.
Cox, David R. 1963. Large sample sequential tests for composite hypotheses. Sankhyā: The Indian Journal of Statistics, Series A (1961–2002) 25: 5–12.
Davies, Mark. 2013. Corpus of News on the Web (NOW): 3+ billion words from 20 countries, updated every day, <[URL]> (26 August 2020).
Eisenstein, Jacob, O’Connor, Brendan, Smith, Noah A. & Xing, Eric P. 2014. Diffusion of lexical change in social media. PLoS ONE 9(11): e113114.
Gao, Jianbo, Hu, Jing, Mao, Xiang & Perc, Matjaž. 2012. Culturomics meets random fractal theory: Insights into long-range correlations of social and natural phenomena over the past two centuries. Journal of The Royal Society Interface 9: 1956–1964.
Grieve, Jack, Nini, Andrea & Guo, Diansheng. 2017. Analyzing lexical emergence in modern American English online. English Language and Linguistics 21(1): 99–127.
Kehoe, Andrew & Gee, Matt. 2009. Weaving web data into a diachronic corpus patchwork. In Corpus Linguistics: Refinements and Reassessments [Language and Computers 69], Antoinette Renouf & Andrew Kehoe (eds), 255–279. Amsterdam: Rodopi.
. 2019. “Thanks for the donds”: A corpus linguistic analysis of topic-based communities in the comment section of The Guardian. In Reference and Identity in Public Discourses [Pragmatics & Beyond New Series 306], Ursula Lutzky & Minna Nevala (eds), 127–158. Amsterdam: John Benjamins.
Michel, Jean-Baptiste, Shen, Yuan Kui, Aiden, Aviva Presser, Veres, Adrian, Gray, Matthew K., The Google Books Team, Pickett, Joseph P., Hoiberg, Dale, Clancy, Dan, Norvig, Peter, Orwant, Jon, Pinker, Steven, Nowak, Martin A. & Aiden, Erez Lieberman. 2011. Quantitative analysis of culture using millions of digitized books. Science 331(6014): 176–182.
Renouf, Antoinette. 2013. A finer definition of neology in English: The life-cycle of a word. In Corpus Perspectives on Patterns of Lexis [Studies in Corpus Linguistics 57], Hilde Hasselgård, Signe Oksefjell Ebeling & Jarle Ebeling (eds), 177–208, Amsterdam: John Benjamins.
. 2018. Big Data: Opportunities and challenges for English corpus linguistics. In From Data to Evidence in English Language Research, Carla Suhr, Terttu Nevalainen & Irma Taavitsainen (eds), 27–65, Leiden: Brill.
Cited by (3)
Cited by three other publications
Jiang, Zeyuan, Jiahao Chen & Zhanting Bu
Gee, Matt, Andrew Kehoe & Antoinette Renouf
2024. Establishing a ‘new normal’. In Crossing Boundaries through Corpora [Studies in Corpus Linguistics, 119], ► pp. 125 ff.
This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
