Charting poverty: Changes in society and language

Schneider, Gerold

doi:10.1075/scl.96.02sch

In:Corpora and the Changing Society: Studies in the evolution of English
Edited by Paula Rautionaho, Arja Nurmi and Juhani Klemola
[Studies in Corpus Linguistics 96] 2020
► pp. 29–56

Get fulltext from our e-platform

Download Book PDF

Download Book EPUB

Changes in society and language

Charting poverty

Gerold Schneider | University of Zurich

Published online: 8 April 2020

https://doi.org/10.1075/scl.96.02sch

Abstract

This study addresses how societal and linguistic changes can be detected using historical corpora, with the topics of poverty and industrial revolution as a case study, based on large historical corpora, in particular EEBO, and CLMET3.0. The results, based on a rich array of state-of-the art statistical approaches (such as kernel density estimation), show how poverty, industrial revolution, and urbanization are associated through, for instance, the associations of war, religion, family, poverty, and suffering. The study also discusses the importance of data size and cleanness, the temptations of distant reading, and the necessity for validating the discovered patterns in close reading and distant reading in interaction.

Article outline

1.Introduction
2.Data and pre-processing
- 2.1The EEBO Collection as sampler corpus
- 2.2The CLMET3.0 corpus
- 2.3The pre-processing step of spelling normalization
3.Methods
- 3.1Data-based and data-driven approaches
- 3.2Document classification
- 3.3Topic modelling
- 3.4Conceptual maps
4.Results and discussion
- 4.1Dictionary-based approach
- 4.2Topic modelling
  - 4.2.1EEBO early vs. EEBO late
  - 4.2.2Adding CLMET3.0 and increasing the number of topics
- 4.3Conceptual maps
5.Conclusions
Notes
References

References (43)

References

Corpora and software

CLMET3.0 = Corpus of Late Modern English Texts, version 3.0. De Smet, Hendrik, Diller, Hans-Jürgen & Tyrkkö, Jukka (comps). <[URL]>

EEBO = Early English Books Online. Davies, Mark (comp.). <[URL]>

Gephi = <[URL]>

Mallet = Machine Learning for LanguagE Toolkit. <[URL]>

Textplot = <[URL]>

VARD2 = Baron, Alistair and Rayson, Paul. 2008. VARD 2: A tool for dealing with spelling variation in historical corpora. Proceedings of the Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham, UK, 22 May 2008.

Other references

Ananiadou, Sophia, Kell, Douglas B. & Tsujii, Jun-ichi. 2006. Text mining and its potential applications in systems biology. Trends in Biotechnology 24(12): 571–579.

Baroni, Marco & Lenci, Alessandro. 2010. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics 36(4): 673–721.

Bartsch, Sabine & Evert, Stefan. 2014. Towards a Firthian notion of collocation. In Vernetzungsstrategien, Zugriffsstrukturen und automatisch ermittelte Angaben in Internetwörterbüchern [OPAL - Online publizierte Arbeiten zur Linguistik 2/2014], Andrea Abel & Lothar Leimnitz (eds), 48–61. Mannheim: Institut für Deutsche Sprache.

Blei, David. 2012. Probabilistic topic models. Communications of the ACM 55(4): 77–84.

Bybee, Joan. 2007. Frequency of Use and the Organization of Language. Oxford: OUP.

Church, Kenneth. 2000. Empirical estimates of adaptation: The chance of Two Noriegas is closer to p/2 than p². Proceedings of the 17th Conference on Computational Linguistics, 180–186. Stroudsburg, PA: Association for Computational Linguistics.

C. W. 2013. Did living standards improve during the Industrial Revolution? The Economist, September 13, 2013. <[URL]> (30 December 2018).

Daudin, Guillaume, O’Rourke, Kevin H., & Prados de la Escosura, Leandro. 2008. Trade and empire, 1700–1870. Technical Report # 2008–24, OFCE: Centre de recherche en économie et sciences po. <[URL]> (30 December 2018).

De Smet, Hendrik. 2005. A corpus of Late Modern English. ICAME Journal 29: 69–82.

Evert, Stefan. 2008. Corpora and collocations. In Corpus Linguistics. An International Handbook, Anke Lüdeling & Merja Kytö (eds), 1212–1248. Berlin: Mouton De Gruyter.

Food and Agriculture Organisation of the United Nations. November 2003. Anti-hunger Programme. A twin-track approach to hunger reduction: priorities for national and international action. <[URL]>.

Firth, Rupert. 1957. A synopsis of linguistic theory 1930–1955. In Studies in Linguistic Analysis [Special volume of the Philological Society], Rupert Firth (ed.), 1–32. Oxford: Blackwell.

Glynn, Dylan. 2010. Corpus-driven cognitive semantics. Introduction to the field. In Quantitative Methods in Cognitive Semantics: Corpus-Driven Approaches [Cognitive Linguistics Research 46], Dylan Glynn & Kerstin Fischer (eds), 1–42. Berlin: Mouton de Gruyter.

Gries, Stefan T. 2010. Corpus linguistics and theoretical linguistics: a love-hate relationship? Not necessarily… International Journal of Corpus Linguistics 15(3): 327–343.

Grimmer, Justin & Stewart, Brandon. 2013. Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Analysis 21(3): 267–297.

Hatton, Timothy J. & Bray, Bernice E. 2010. Long run trends in the heights of European men, 19th–20th centuries. Economics & Human Biology 8(3): 405–413.

Hilpert, Martin & Gries, Stefan T. 2016. Quantitative approaches to diachronic corpus linguistics. In The Cambridge Handbook of English Historical Linguistics, Merja Kytö & Päivi Pahta (eds), 36–53. Cambridge: CUP.

Janda, Linda A. 2013. Cognitive Linguistics: The Quantitative Turn. Berlin: Mouton de Gruyter.

Jurafsky, Daniel & Martin, James H. 2009. Speech and Language Processing: An Introduction to Natural Language Processing, Speech Recognition, and Computational Linguistics, 2nd edn. Upper Saddle River, NJ: Prentice-Hall.

Komlos, John. 1998. Shrinking in a growing economy? The mystery of physical stature during the industrial revolution. Journal of Economic History 58: 779–802.

Krippendorff, Klaus. 2004. Content Analysis, 2nd edn. London: Sage.

Michel, Jean-Baptiste, Shen, Yuan Kui, Presser Aiden, Aviva, Veres, Adrian, Gray, Matthew K., The Google Books Team, Pickett, Joseph P., Hoiberg, Dale, Clancy, Dan, Norvig, Peter, Orwant, Jon, Pinker, Steven, Nowak, Martin A. & Lieberman Aiden, Erez. 2011. Quantitative analysis of culture using millions of digitized books. Science 331(6014): 176–182.

Moretti, Franco. 2013. Distant Reading. London: Verso.

Oakes, Michael P. 2014. Literary Detective Work on the Computer [Natural Language Processing 12]. Amsterdam: John Benjamins.

Oxford English Dictionary. 2010. 3rd edn. Oxford: OUP.

Rayson, Paul. 2008. From key words to key semantic domains. International Journal of Corpus Linguistics 13(4): 519–549.

Sahlgren, Magnus. 2006. The Word-Space Model: Using Distributional Analysis to Represent Syntagmatic and Paradigmatic Relations between Words in High- Dimensional Vector Spaces. PhD dissertation, Stockholm University.

Schneider, Gerold. 2014. Applying Computational Linguistics and Language Models: From Descriptive Linguistics to Text Mining and Psycholinguistics. Cumulative Habilitation, University of Zurich.

. 2018. Differences between Swiss High German and German High German via data-driven methods. In Proceedings of SwissText 2018, Mark Cieliebak, Don Tuggener & Fernando Benites (eds), 6–16. <[URL]> (30 December 2018).

Schneider, Gerold, Pettersson, Eva & Percillier, Michael. 2017. Comparing rule-based and SMT-based spelling normalisation for English historical texts. Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language. <[URL]> (30 December 2018).

Schwartz, H. Andrew & Ungar, Lyle H. 2015. Data-driven content analysis of social media: A systematic overview of automated methods. The ANNALS of the American Academy of Political and Social Science 659(1): 78–94.

Szreter, Simon & Mooney, Graham. 1998. Urbanization, mortality, and the standard of living debate: new estimates of the expectation of life at birth in nineteenth-century British cities. Economic History Review 51(1): 84–112.

Taavitsainen, Irma & Schneider, Gerold. 2019. Scholastic argumentation in Early English medical writing and its afterlife: New corpus evidence. In From Data to Evidence in English Language Research [Language and Computers 83], Carla Suhr, Terttu Nevalainen & Irma Taavitsainen (eds), 191–221. Leiden: Brill.

Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work [Studies in Corpus Linguistics 6]. Amsterdam: John Benjamins.

Webster’s Dictionary of the English Language. 1961. Springfield, MA: Merriam-Webster Incorporated.

Wüest, Bruno, Schneider, Gerold & Amsler, Michael. 2014. Measuring the public accountability of new modes of governance. Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, Baltimore, Maryland, 38–43. Stroudsburg, PA: Association for Computational Linguistics.

Yang, Li-gong, Zhu, Jian & Tang, Shi-ping. 2013. Keywords extraction based on text classification. Advanced Materials Research 765–767: 1604–1609.

Cited by (1)

Cited by one other publication

Schneider, Gerold & Maud Reveilhac

2023. Colloquialisation, compression and democratisation in British parliamentary debates. In Exploring Language and Society with Big Data [Studies in Corpus Linguistics, 111], ► pp. 336 ff.

This list is based on CrossRef data as of 1 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.