Article published In: International Journal of Corpus Linguistics
Vol. 19:3 (2014) ► pp.401–416
Making Google Books n-grams useful for a wide range of research on language change
Published online: 1 September 2014
https://doi.org/10.1075/ijcl.19.3.04dav
https://doi.org/10.1075/ijcl.19.3.04dav
The “standard” Google Books n-grams were released by Google in 2010, and they include more than 155 billion words of data for the American English data alone. Unfortunately, the standard interface is far too simplistic to allow many types of useful research on this massive dataset. In this paper, I discuss an alternative “advanced” architecture and interface for these datasets, which is freely available at googlebooks.byu.edu. This resource allows for a wide range of research on lexical, phraseological, syntactic, and semantic changes in English, in ways that would not be possible with the standard interface. With this new resource, researchers now have access to hundreds of billions of words of data, and can map out changes in English in ways that were not previously possible.
Keywords: Google Books, historical, syntactic, semantic, lexical
References (8)
Davies, M. (2012a). “Expanding horizons in historical linguistics with the 400 million word Corpus of Historical American English”. Corpora 7, (2), 121-57.
. (2012b). “Examining recent changes in English: Some methodological issues". In T. Nevalainen & E.C. Traugott (Eds.), The Oxford Handbook of the History of English. Oxford: Oxford University Press, 263-87.
. (forthcoming). “A corpus-based study of lexical developments in Early and Late Modern English”. In M. Kytö & P. Pahta (Eds.), Handbook of English Historical Linguistics,. Cambridge: Cambridge University Press.
de Smet, H. 2008. Diffusional Change in the English System of Complementation: Gerunds, Participles and for...to-infinitives. Unpublished doctoral dissertation. University of Leuven, Belgium.
Michel, J.B., Kui Shen, Y., Presser Aiden, A., Veres, A., Gray, M., The Google Books Team, Pickett, J., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M. & Lieberman Aiden, E. 2011. “Quantitative analysis of culture using millions of digitized books”. Science, 3311, 176-182.
Nunberg, G. 2009. “Google's Book Search: A disaster for scholars”. The Chronicle of Higher Education, August 31, 2009. Available at: [URL] (accessed March 2014).
. 2010. “Counting on Google Books”. The Chronicle of Higher Education. December 16, 2010. Available at: [URL] (accessed March 2014).
Cited by (18)
Cited by 18 other publications
Richter, Fabian, Federico Matteucci, Peter Reimann & Klemens Böhm
Richter, Fabian & Klemens Böhm
Tuparevska, Elena
Bollen, Johan, Marijn ten Thij, Fritz Breithaupt, Alexander T. J. Barron, Lauren A. Rutter, Lorenzo Lorenzo-Luaces & Marten Scheffer
He, Qingshun
He, Qingshun
He, Qingshun
Nevalainen, Terttu
Vijayarani, J. & T. V. Geetha
Banasiak, Dariusz, Jarosław Mierzwa & Antoni Sterna
Donmez, Ilknur & Elena Battini Sonmez
Dönmez, İlknur
Zięba, Anna
Liao, Xuanyi & Guang Cheng
Zakharov, V. P. & A. Ts. Masevich
Freddi, Maria
2015. Review of Gatto (2014): Web as Corpus. Theory and Practice. International Journal of Corpus Linguistics 20:1 ► pp. 121 ff.
[no author supplied]
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
