In:Exploring Future Paths for Historical Sociolinguistics
Edited by Tanja Säily, Arja Nurmi, Minna Palander-Collin and Anita Auer
[Advances in Historical Sociolinguistics 7] 2017
► pp. 23–52
Exploring part-of-speech frequencies in a sociohistorical corpus of English
Published online: 19 December 2017
https://doi.org/10.1075/ahs.7.02sai
https://doi.org/10.1075/ahs.7.02sai
We investigate the usefulness of part-of-speech (POS) annotation as a tool in the study of sociolinguistic variation and genre evolution. We analyse how POS ratios change over time in the Parsed Corpus of Early English Correspondence (c.1410–1681), which social groups lead the changes, and whether the changes can be connected to colloquialisation with regard to reduced complexity or an increasingly involved style. While we find gentry-led colloquialisation in terms of noun and verb frequencies as well as evidence for gendered styles, the results on structural complexity are more mixed. We argue that POS annotation can be a useful tool when complemented by a thorough textual analysis, but that more fine-grained categories are needed to reach firmer conclusions.
Article outline
- 1.Introduction
- 2.Background
- 2.1POS ratios in the study of (sociolinguistic) variation
- 2.2Complexity in the genre of personal correspondence
- 3.Material and method
- 3.1PCEEC and ReCEEC
- 3.2Visualisation
- 4.Analysis
- 4.1Complexity in the Parsed Corpus of Early English Correspondence
- 4.2Colloquialisation and gendered styles
- 5.Discussion and conclusion
Acknowledgements Notes References Appendix
References (56)
Argamon, Shlomo, Moshe Koppel, Jonathan Fine & Anat Rachel Shimoni. 2003. Gender, genre, and writing style in formal written texts. Text 23(3). 321–346. DOI:
Atzmueller, Martin. 2015. Subgroup discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(1). 35–49. DOI:
Bamman, David, Jacob Eisenstein & Tyler Schnoebelen. 2014. Gender identity and lexical variation in social media. Journal of Sociolinguistics 18(2). 135–160. DOI:
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. DOI:
Biber, Douglas. 1992. On the complexity of discourse complexity: A multidimensional analysis. Discourse Processes 15(2). 133–163. DOI:
Biber, Douglas & Jena Burges. 2000. Historical change in the language use of women and men: Gender differences in dramatic dialogue. Journal of English Linguistics 28(1). 21–37. DOI:
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style (Cambridge Textbooks in Linguistics). Cambridge: Cambridge University Press. DOI:
Biber, Douglas & Edward Finegan. 1989. Drift and the evolution of English style: A history of three genres. Language 65(3). 487–517. DOI:
Biber, Douglas & Edward Finegan. 1997. Diachronic relations among speech-based and written registers in English. In Terttu Nevalainen & Leena Kahlas-Tarkka (eds.), To explain the present: Studies in the changing English language in honour of Matti Rissanen (Mémoires de la Société Néophilologique de Helsinki 52), 253–275. Helsinki: Société Néophilologique.
Biber, Douglas & Bethany Gray. 2010. Being specific about historical change: The influence of sub-register. Journal of English Linguistics 41(2). 104–134. DOI:
Biber, Douglas & Bethany Gray. 2011. The historical shift of scientific academic prose in English towards less explicit styles of expression: Writing without verbs. In Vijay Bhatia, Purificación Sánchez Hernández & Pascual Pérez-Paredes (eds.), Researching specialized languages (Studies in Corpus Linguistics 47), 11–24. Amsterdam: John Benjamins. DOI:
Biber, Douglas, Bethany Gray & Shelley Staples. 2016. Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics 37(5). 639–668. DOI:
Carpenter, Bob, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li & Allen Riddell. 2017. Stan: A probabilistic programming language. Journal of Statistical Software 76(1). DOI:
Chafe, Wallace. 1982. Integration and involvement in speaking, writing, and oral literature. In Deborah Tannen (ed.), Spoken and written language, 35–53. Norwood, NJ: Ablex.
Hardie, Andrew. 2007. Part-of-speech ratios in English corpora. International Journal of Corpus Linguistics 12(1). 55–81. DOI:
Heylighen, Francis & Jean-Marc Dewaele. 2002. Variation in the contextuality of language: An empirical measure. Foundations of Science 7(3). 293–340. DOI:
Hinneburg, Alexander, Heikki Mannila, Samuli Kaislaniemi, Terttu Nevalainen & Helena Raumolin-Brunberg. 2007. How to handle small samples: Bootstrap and Bayesian methods in the analysis of linguistic change. Literary and Linguistic Computing 22(2). 137–150. DOI:
Huddleston, Rodney & Geoffrey K. Pullum (eds.). 2002. The Cambridge grammar of the English language. Cambridge: Cambridge University Press. DOI:
Kohnen, Thomas. 2007. ‘Connective profiles’ in the history of English texts. Aspects of orality and literacy. In Ursula Lenker & Anneli Meurman-Solin (eds.), Connectives in the history of English, 289–308. Amsterdam: John Benjamins. DOI:
Labov, William. 1982. Building on empirical foundations. In Winfred P. Lehmann & Yakov Malkiel (eds.), Perspectives on historical linguistics: Papers from a conference held at the meeting of the Language Theory Division, Modern Language Assn, San Francisco, 27–30 December 1979 (Current Issues in Linguistic Theory 24), 17–92. Amsterdam: John Benjamins. DOI:
Labov, William. 1990. The intersection of sex and social class in the course of linguistic change. Language Variation and Change 2(2). 205–254. DOI:
Labov, William. 1994. Principles of linguistic change, volume 1: Internal factors. Oxford: Blackwell.
Lehto, Anu. 2015. The genre of Early Modern English statutes: Complexity in historical legal language (Mémoires de la Société Néophilologique de Helsinki 97). Helsinki: Société Néophilologique.
Mair, Christian, Marianne Hundt, Geoffrey Leech & Nicholas Smith. 2002. Short term diachronic shifts in part-of-speech frequencies. A comparison of the tagged LOB and F-LOB corpora. International Journal of Corpus Linguistics 7(2). 245–264. DOI:
Mäkelä, Eetu, Tanja Säily & Terttu Nevalainen. 2016. Khepri – a modular view-based tool for exploring (historical sociolinguistic) data. In Maciej Eder & Jan Rybicki (eds.), Digital Humanities 2016: Conference abstracts, 269–272. Kraków: Jagiellonian University & Pedagogical University.
Markus, Manfred. 2001. The development of prose in Early Modern English in view of the gender question: Using grammatical idiosyncracies of 15th and 17th century letters. European Journal of English Studies 5(2). 181–196. DOI:
Meurman-Solin, Anneli. 2011. Utterance-initial connective elements in early Scottish epistolary prose. In Anneli Meurman-Solin & Ursula Lenker (eds.), Connectives in synchrony and diachrony in European languages (Studies in Variation, Contacts and Change in English 8). Helsinki: VARIENG. [URL] (17 December, 2016.)
Nevala, Minna. 2004. Address in early English correspondence: Its forms and socio-pragmatic functions (Mémoires de la Société Néophilologique de Helsinki 64). Helsinki: Société Néophilologique.
Nevalainen, Terttu. 2002. Language and woman’s place in earlier English. Journal of English Linguistics 30(2). 181–199. DOI:
Nevalainen, Terttu & Helena Raumolin-Brunberg. 2003. Historical sociolinguistics: Language change in Tudor and Stuart England (Longman Linguistics Library). London: Pearson Education.
Newman, Matthew L., Carla J. Groom, Lori D. Handelman & James W. Pennebaker. 2008. Gender differences in language use: An analysis of 14,000 text samples. Discourse Processes 45(3). 211–236. DOI:
Palander-Collin, Minna. 1999. Grammaticalization and social embedding: I THINK and METHINKS in Middle and Early Modern English (Mémoires de la Société Néophilologique de Helsinki 55). Helsinki: Société Néophilologique.
Palander-Collin, Minna. 2000. The language of husbands and wives in seventeenth-century correspondence. In Christian Mair & Marianne Hundt (eds.), Corpus linguistics and linguistics theory. Papers from the twentieth International Conference on English Language Research on Computerized Corpora (ICAME 20), Freiburg im Breisgau 1999 (Language and Computers: Studies in Practical Linguistics 33), 289–300. Amsterdam: Rodopi.
Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk & Terttu Nevalainen. Compiled by the CEEC Project TeamPCEEC = Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk & Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York & Helsinki: University of Helsinki. Distributed through the Oxford Text Archive. [URL] (17 December, 2016.)
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive grammar of the English language. London: Longman.
R Core Team. 2016. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. [URL] (17 December, 2016.)
Raumolin-Brunberg, Helena & Terttu Nevalainen. 2007. Historical sociolinguistics: The Corpus of Early English Correspondence. In Joan C. Beal, Karen P. Corrigan & Hermann L. Moisl (eds.), Creating and digitizing language corpora, volume 2: Diachronic databases, 148–171. Houndsmills: Palgrave Macmillan. DOI:
Rayson, Paul, Geoffrey Leech & Mary Hodges. 1997. Social differentiation in the use of English vocabulary: Some analyses of the conversational component of the British National Corpus. International Journal of Corpus Linguistics 2(1). 133–152. DOI:
Rescher, Nicholas. 1998. Complexity: A philosophical overview. New Brunswick, NJ: Transaction Publishers.
Säily, Tanja, Terttu Nevalainen & Harri Siirtola. 2011. Variation in noun and pronoun frequencies in a sociohistorical corpus of English. Literary and Linguistic Computing 26(2). 167–188. DOI:
Santorini, Beatrice. 2016. Annotation manual for the Penn Historical Corpora and the York-Helsinki Corpus of Early English Correspondence. [URL] (17 December, 2016.)
Siirtola, Harri, Poika Isokoski, Tanja Säily & Terttu Nevalainen. 2016. Interactive text visualization with Text Variation Explorer. In Ebad Banissi, Mark W. McK. Bannatyne, Fatma Bouali, Remo Burkhard, John Counsell, Urska Cvek, Martin J. Eppler, Georges Grinstein, Wei Dong Huang, Sebastian Kernbach, Chun-Cheng Lin, Feng Lin, Francis T. Marchese, Chi Man Pun, Muhammad Sarfraz, Marjan Trutschl, Anna Ursyn, Gilles Venturini, Theodor G. Wyeld & Jian J. Zhang (eds.), Proceedings of the 20th international conference on Information Visualisation (IV 2016), 330–335. Los Alamitos, California, CA: IEEE Computer Society. DOI:
Siirtola, Harri, Terttu Nevalainen, Tanja Säily & Kari-Jouko Räihä. 2011. Visualisation of text corpora: A case study of the PCEEC. In Terttu Nevalainen & Susan M. Fitzmaurice (eds.), How to deal with data: Problems and approaches to the investigation of the English language over time and space (Studies in Variation, Contacts and Change in English 7). Helsinki: VARIENG. [URL] (17 December, 2016.)
Siirtola, Harri, Tanja Säily, Terttu Nevalainen & Kari-Jouko Räihä. 2014. Text Variation Explorer: Towards interactive visualization tools for corpus linguistics. International Journal of Corpus Linguistics 19(3). 417–429. DOI:
Smitterberg, Erik. 2008. The progressive and phrasal verbs: Evidence of colloquialization in nineteenth-century English? In Terttu Nevalainen, Irma Taavitsainen, Päivi Pahta & Minna Korhonen (eds.), The dynamics of linguistic variation: Corpus evidence on English past and present (Studies in Language Variation 2), 269–289. Amsterdam: John Benjamins. DOI:
Tannen, Deborah. 1991. You just don’t understand: Women and men in conversation. New York: Morrow and Company.
Taylor, Ann. 2007. The York-Toronto-Helsinki Parsed Corpus of Old English Prose. In Joan C. Beal, Karen P. Corrigan & Hermann L. Moisl (eds.), Creating and digitizing language corpora, volume 2: Diachronic databases, 196–227. Houndsmills: Palgrave Macmillan. DOI:
Taylor, Ann & Beatrice Santorini. 2006. The Parsed Corpus of Early English Correspondence. University of York. [URL] (17 December, 2016.)
Vartiainen, Turo, Tanja Säily & Mikko Hakala. 2013. Variation in pronoun frequencies in early English letters: Gender-based or relationship-based? In Jukka Tyrkkö, Olga Timofeeva & Maria Salenius (eds.), Ex philologia lux: Essays in honour of Leena Kahlas-Tarkka (Mémoires de la Société Néophilologique de Helsinki 90), 233–255. Helsinki: Société Néophilologique.
Cited by (6)
Cited by six other publications
Säily, Tanja, Martin Hilpert & Jukka Suomela
2024. New approaches to investigating change in derivational productivity. In Crossing Boundaries through Corpora [Studies in Corpus Linguistics, 119], ► pp. 8 ff.
Säily, Tanja, Turo Vartiainen, Harri Siirtola & Terttu Nevalainen
2024. Changing styles of letter-writing?. In Unlocking the History of English [Current Issues in Linguistic Theory, 364], ► pp. 154 ff.
Vartiainen, Turo & Tanja Säily
2024. Engaging with bad (meta)data in historical corpus linguistics. In Challenges in corpus linguistics [Studies in Corpus Linguistics, 118], ► pp. 9 ff.
Saario, Lassi, Tanja Säily, Samuli Kaislaniemi & Terttu Nevalainen
Leiwo, Martti
Rudnicka, Karolina
2018. Variation of sentence length across time and genre. In Diachronic Corpora, Genre, and Language Change [Studies in Corpus Linguistics, 85], ► pp. 219 ff.
This list is based on CrossRef data as of 7 march 2026. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
