In:Crossing Boundaries through Corpora: Innovative corpus approaches within and beyond linguistics
Edited by Sarah Buschfeld, Patricia Ronan, Theresa Neumaier, Andreas Weilinghoff and Lisa Westermayer
[Studies in Corpus Linguistics 119] 2024
► pp. 125–152
Chapter 6Establishing a ‘new normal’
Detecting fluctuating trends in word frequency over time
Published online: 17 October 2024
https://doi.org/10.1075/scl.119.06gee
https://doi.org/10.1075/scl.119.06gee
Abstract
In this chapter we introduce statistical methods and associated visualisations for the analysis of lexical
change on a monthly basis in a 1.8-billion word news corpus spanning over 30 years. In previous work (Kehoe et al. 2022) we found examples of word frequency change in a data-driven
manner by applying existing statistical tests. An ongoing limitation is that, as our diachronic corpus grows, so too
does the possibility of a word exhibiting multiple frequency changes in different directions. This chapter reframes
the problem as one of time-series segmentation, dividing the frequency history of a word into timespans exhibiting
consistent upward or downward change. We then determine reasons for such changes by applying horizon graph
visualisations to collocates.
Keywords: lexical change, collocation, time series, visualisation, diachronic change
Article outline
- 1.Introduction and background
- 1.1From corpus to time series
- 1.2Identifying changes in frequency
- 1.3Exploring collocational change
- 2.Method
- 2.1Visualising collocational change
- 3.Results and discussion
- 3.1Downward trends
- 3.2Upward trends
- 3.3Multiple abrupt shifts
- 3.4Multiple trends
- 4.Conclusion
Notes References
References (27)
Baker, Helen, Brezina, Vaclav & McEnery, Tony. 2017. Ireland
in British parliamentary debates 1803–2005: Plotting changes in discourse in a large volume of time-series
corpus data. In Exploring Future Paths for Historical
Sociolinguistics [Advances in Historical Sociolinguistics
7], Tanja Säily, Arja Nurmi, Minna Palander-Collin & Anita Auer (eds), 83–107. Amsterdam: John Benjamins.
Boulton, Chris & Lenton, Timothy. 2019. A
new method for detecting abrupt shifts in time
series. F1000Research 8: 746.
Clarke, Isobelle, Brookes, Gavin & McEnery, Tony. 2022. Keywords
through time: Tracking changes in press discourse of Islam. International
Journal of Corpus
Linguistics 27(4): 399–427.
Collier, Alex. 1993. Issues
of large-scale collocational analysis. In English
Language Corpora: Design, Analysis and Exploitation, Jan Aarts, Pieter de Haan & Nelleke Oosdijk (eds), 289–298. Amsterdam: Rodopi.
Cox, David, 1952. Sequential
tests for composite hypotheses. Mathematical Proceedings of the Cambridge
Philosophical
Society 48(2): 290–299.
Davis, Mark. 2013. Corpus
of News on the Web (NOW): 3+ billion words from 20 countries, updated every
day. <[URL]> (18 July
2023).
Gao, Jianbo, Hu, Jing, Mao, Xiang & Perc, Matjaž. 2012. Culturomics
meets random fractal theory: Insights into long-range correlations of social and natural phenomena over the
past two centuries. Journal of The Royal Society
Interface 9: 1956–1964.
Greenfield, Patricia. 2013. The
changing psychology of culture from 1800 through 2000. Psychological
Science 24(9): 1722–1731.
Grieve, Jack, Nini, Andrea & Guo, Diansheng. 2017. Analyzing
lexical emergence in modern American English online. English Language and
Linguistics 21(1): 99–127.
Harrower, Mark & Brewer, Cynthia. 2003. ColorBrewer.org:
An online tool for selecting colour schemes for maps. The Cartographic
Journal 40(1): 27–37.
Heer, Jeffrey, Kong, Nicholas & Agrawala, Maneesh. 2009. Sizing
the horizon: The effects of chart size and layering on the graphical perception of time series
visualizations. In Proceedings of the SIGCHI
Conference on Human Factors in Computing
Systems, 1303–1312. New York NY: Association for Computing Machinery.
Hilpert, Martin. 2020. The
great temptation: What diachronic corpora do and do not reveal about social
change. In Corpora and the Changing Society. Studies
in the Evolution of English [Studies in Corpus Linguistics
96], Paula Rautionaho, Arja Nurmi & Juhani Klemola (eds), 3–27. Amsterdam: John Benjamins.
Kehoe, Andrew & Gee, Matt. 2009. Weaving
Web data into a diachronic corpus
patchwork. In Corpus Linguistics: Refinements and
Reassessments, Antoinette Renouf & Andrew Kehoe (eds), 255–279. Amsterdam: Rodopi.
Kehoe, Andrew, Gee, Matt & Renouf, Antoinette. 2022. A
data-driven approach to finding significant changes in language use through time series
analysis. In Broadening the Spectrum of Corpus
Linguistics: New Approaches to Variability and Change [Studies in Corpus Linguistics
105], Susanne Flach & Martin Hilpert (eds), 285–318. Amsterdam: John Benjamins.
McEnery, Tony, Brezina, Vaclav & Baker, Helen. 2019. Usage
Fluctuation Analysis: A new way of analysing shifts in historical
discourse. International Journal of Corpus
Linguistics 24(4): 413–444.
Michel, Jean-Baptiste, Yuan Kui Shen, Aiden, Aviva P., Veres, Adrian, Gray, Matthew K., The Google Books
Team, Pickett, Joseph P., Hoiberg, Dale, Clancy, Dan, Norvig, Peter, Orwant, Jon, Pinker, Steven, Nowak, Martin A. & Lieberman Aiden, Erez. 2011. Quantitative
analysis of culture using millions of digitized
books. Science 331(6014): 176–182.
Murakami, Akira, Thompson, Paul, Hunston, Susan & Vajn, Dominik. 2017. ‘What
is this corpus about?’: Using topic modelling to explore a specialised
corpus. Corpora 12(2): 243–277.
Renouf, Antoinette. 2013. A
finer definition of neology in English: The life-cycle of a
word. In Corpus Perspectives on Patterns of
Lexis [Studies in Corpus Linguistics 57], Hilde Hasselgård, Jarle Ebeling & Signe Oksefjell Ebeling (eds), 177–208. Amsterdam: John Benjamins.
. 2018. Big
data: Opportunities and challenges for English corpus
linguistics. In From Data to Evidence in English
Language Research, Carla Suhr, Terttu Nevalainen & Irma Taavitsainen,
(eds), 27–65. Leiden: Brill.
Renouf, Antoinette & Kehoe, Andrew. 2013. Filling
the gaps: Using the WebCorp Linguist’s Search Engine to supplement existing text
resources. International Journal of Corpus
Linguistics 18(2): 167–198.
Saito, Takafumi, Miyamura, Hiroko Nakamura, Yamamoto, Mitsuyoshi, Saito, Hiroki, Hoshiya, Yuka & Kaseda, Takumi. 2005. Two-tone
pseudo coloring: Compact visualization for one-dimensional
data. In IEEE Symposium on Information Visualization
INFOVIS 2005, 173–180. IEEE.
Schlechtweg, Dominik, McGillivray, Barbara, Hengchen, Simon, Dubossarsky, Haim & Tahmasebi, Nina. 2020. SemEval-2020
Task 1: Unsupervised lexical semantic change
detection. arXiv:2007.11464.
Schneider, Gerold. 2022. Systematically
detecting patterns of social, historical and linguistic change: The framing of poverty in times of
poverty. Transactions of the Philological
Society 120(3): 447–473.
