Making Google Books n-grams useful for a wide range of research on language change

Davies, Mark

doi:10.1075/ijcl.19.3.04dav

Article published In: International Journal of Corpus Linguistics
Vol. 19:3 (2014) ► pp.401–416

Get fulltext from our e-platform

Download PDF

Making Google Books n-grams useful for a wide range of research on language change

Mark Davies | Brigham Young University

Published online: 1 September 2014

https://doi.org/10.1075/ijcl.19.3.04dav

The “standard” Google Books n-grams were released by Google in 2010, and they include more than 155 billion words of data for the American English data alone. Unfortunately, the standard interface is far too simplistic to allow many types of useful research on this massive dataset. In this paper, I discuss an alternative “advanced” architecture and interface for these datasets, which is freely available at googlebooks.byu.edu. This resource allows for a wide range of research on lexical, phraseological, syntactic, and semantic changes in English, in ways that would not be possible with the standard interface. With this new resource, researchers now have access to hundreds of billions of words of data, and can map out changes in English in ways that were not previously possible.

Keywords: Google Books, historical, syntactic, semantic, lexical

References (8)

Davies, M. (2012a). “Expanding horizons in historical linguistics with the 400 million word Corpus of Historical American English”. Corpora 7, (2), 121-57.

. (2012b). “Examining recent changes in English: Some methodological issues". In T. Nevalainen & E.C. Traugott (Eds.), The Oxford Handbook of the History of English. Oxford: Oxford University Press, 263-87.

. (forthcoming). “A corpus-based study of lexical developments in Early and Late Modern English”. In M. Kytö & P. Pahta (Eds.), Handbook of English Historical Linguistics,. Cambridge: Cambridge University Press.

de Smet, H. 2008. Diffusional Change in the English System of Complementation: Gerunds, Participles and for...to-infinitives. Unpublished doctoral dissertation. University of Leuven, Belgium.

Firth, J.R. 1957. Papers in Linguistics 1934–1951. London: Oxford University Press.

Michel, J.B., Kui Shen, Y., Presser Aiden, A., Veres, A., Gray, M., The Google Books Team, Pickett, J., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. & Lieberman Aiden, E. 2011. “Quantitative analysis of culture using millions of digitized books”. Science, 3311, 176-182.

Nunberg, G. 2009. “Google's Book Search: A disaster for scholars”. The Chronicle of Higher Education, August 31, 2009. Available at: [URL] (accessed March 2014).

. 2010. “Counting on Google Books”. The Chronicle of Higher Education. December 16, 2010. Available at: [URL] (accessed March 2014).

Cited by (18)

Cited by 18 other publications

Order by:

Richter, Fabian, Federico Matteucci, Peter Reimann & Klemens Böhm

2025. Survey on information requirements on the Google Books Ngram Corpus. International Journal of Digital Humanities

Richter, Fabian & Klemens Böhm

2024. Proceedings of the 24th ACM/IEEE Joint Conference on Digital Libraries, ► pp. 1 ff.

Tuparevska, Elena

2022. The history of lifelong education in Spain and other Spanish speaking countries: 1730-1959. International Journal of Lifelong Education 41:2 ► pp. 229 ff.

Bollen, Johan, Marijn ten Thij, Fritz Breithaupt, Alexander T. J. Barron, Lauren A. Rutter, Lorenzo Lorenzo-Luaces & Marten Scheffer

2021. Historical language records reveal a surge of cognitive distortions in recent decades. Proceedings of the National Academy of Sciences 118:30

Buerki, Andreas

2020. Formulaic Language and Linguistic Change,

He, Qingshun

2020. A Corpus-based Study of Transfers in English Nominal Groups. Glottotheory 10:1-2 ► pp. 57 ff.

He, Qingshun

2020. A Corpus-based Study of Transfers in English Gerunds. In Corpus-based Approaches to Grammar, Media and Health Discourses [The M.A.K. Halliday Library Functional Linguistics Series, ], ► pp. 17 ff.

He, Qingshun

2021. A corpus-based study of interpersonal metaphors of modality in English. Studia Neophilologica 93:1 ► pp. 50 ff.

Nevalainen, Terttu

2020. Using Large Recent Corpora to Study Language Change. In The Handbook of Historical Linguistics, ► pp. 272 ff.

Vijayarani, J. & T. V. Geetha

2020. RETRACTED ARTICLE: Knowledge-enhanced temporal word embedding for diachronic semantic change estimation. Soft Computing 24:17 ► pp. 12901 ff.

Banasiak, Dariusz, Jarosław Mierzwa & Antoni Sterna

2018. Extended N-gram Model for Analysis of Polish Texts. In Man-Machine Interactions 5 [Advances in Intelligent Systems and Computing, 659], ► pp. 355 ff.

Donmez, Ilknur & Elena Battini Sonmez

2018. 2018 3rd International Conference on Computer Science and Engineering (UBMK), ► pp. 56 ff.

Dönmez, İlknur

2018. Human Activity Analysis and Prediction Using Google n-Grams. International Journal of Future Computer and Communication 7:2 ► pp. 32 ff.

Zięba, Anna

2018. Google Books Ngram Viewer in Socio-Cultural Research. Research in Language 16:3 ► pp. 357 ff.

Liao, Xuanyi & Guang Cheng

2016. Analysing the Semantic Change Based on Word Embedding. In Natural Language Understanding and Intelligent Applications [Lecture Notes in Computer Science, 10102], ► pp. 213 ff.

Zakharov, V. P. & A. Ts. Masevich

2016. The experience of corpus-subjected historical-cultural studies of historical and political vocabulary. Bibliosphere :2 ► pp. 47 ff.

Freddi, Maria

2015. Review of Gatto (2014): Web as Corpus. Theory and Practice. International Journal of Corpus Linguistics 20:1 ► pp. 121 ff.

[no author supplied]

2018. Patterns of Change in 18th-century English [Advances in Historical Sociolinguistics, 8],

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.