In:Corpus-based Approaches to Register Variation
Edited by Elena Seoane and Douglas Biber
[Studies in Corpus Linguistics 103] 2021
► pp. 291–312
Chapter 11Measuring informativity
The rise of compounds as informationally dense structures in 20th-century Scientific English
Published online: 8 December 2021
https://doi.org/10.1075/scl.103.11deg
https://doi.org/10.1075/scl.103.11deg
Abstract
By applying data-driven methods based on information theory, this study adds to previous work on the
development of the scientific register by measuring the informativity of alternative phrasal structures shown to be involved
in change in language use in 20th-century Scientific English. The analysis based on data-driven periodization shows compounds
to be distinctive grammatical structures from the 1920s onwards in Proceedings A of the Royal Society of
London. Compounds not only increase in frequency, but also show higher informativity than their less dense
prepositional counterparts. Results also show that the lower the informativity of particular items, the more alternative, more
informationally dense options might be favoured (e.g., of-phrases vs. compounds) – striving for
communicative efficiency thus being one force shaping the scientific register.
Article outline
- 1.Introduction
- 2.Methods
- 2.1Data
- 2.2Data-driven periodization with Kullback-Leibler divergence
- 2.3Determining informativity: Surprisal
- 3.Tracing change in grammatical use in 20th-century Scientific English
- 3.1The temporal dynamics of grammatical use in 20th-century Scientific English
- 3.2Kinds of change in grammatical use: Inspecting distinctive patterns
- 3.3Tracing changes towards the use of informationally dense structures
- 4.Tracing the development of informationally dense structures
- 5.Conclusion
Notes References
References (43)
Barron, Alexander T. J., Huang, Jenny, Spang, Rebecca L. & DeDeo, Simon. 2018. Individuals,
institutions, and innovation in the debates of the French Revolution. Proceedings of
the National Academy of
Sciences 115(18): 4607–4612.
Biber, Douglas & Finegan, Edward. 1989. Drift
and the evolution of English style: A history of three
genres. Language 65(3): 487–517.
Biber, Douglas & Gray, Bethany. 2011. The
historical shift of scientific academic prose in English towards less explicit styles of expression: Writing without
verbs. In Researching Specialized
Languages [Studies in Corpus Linguistics 47], Vijay Bhatia, Purificación Sa´nchez & Pascual Pe´rez-Paredes (eds), 11–24. Amsterdam: John Benjamins.
. 2013. Nominalizing
the verb phrase in academic science writing. In The Verb
Phrase in English: Investigating Recent Language Change with Corpora, Bas Aarts, Joanne Close, Geoffrey Leech & Sean Wallis (eds), 99–132. Cambridge: CUP.
Bizzoni, Yuri, Degaetano-Ortlieb, Stefania, Fankhauser, Peter & Teich, Elke. 2020. Linguistic
variation and change in 250 years of English scientific writing: A data-driven
approach. Frontiers in Artificial Intelligence, section Language and
Computation.
Bochkarev, Vladimir, Solovyev, Valery D. & Wichmann, Soren. 2014. Universals
versus historical contingencies in lexical evolution. Journal of The Royal Society
Interface 11(101): 1–8.
Culpeper, Jonathan & Kytö, Merja. 2010. Early
Modern English Dialogues: Spoken Interaction as
Writing. Cambridge: CUP.
Degaetano-Ortlieb, Stefania, Kermes, Hannah, Khamis, Ashraf & Teich, Elke. 2019. An
information-theoretic approach to modeling diachronic change in scientific
English. In From Data to Evidence in English Language
Research, Carla Suhr, Terttu Nevalainen & Irma Taavitsainen (eds), 258–281. Leiden: Brill.
Degaetano-Ortlieb, Stefania & Piper, Andrew. 2019. The
scientization of literary study. In Proceedings of the 3nd
Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature
at NAACL, 18–28, Minneapolis, MN, June. East Stroudsburg PA: ACL.
Degaetano-Ortlieb, Stefania & Teich, Elke. 2018. Using
relative entropy for detection and analysis of periods of diachronic linguistic
change. In Proceedings of the 2nd Joint SIGHUM Workshop on
Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature at
COLING, 22–33, Santa Fe, NM, September. East Stroudsburg PA: ACL.
. 2019. Toward
an optimal code for communication: The case of scientific English. Corpus Linguistics
and Linguistic, 1–33.
Delogu, Francesca, Crocker, Matthew & Drenhaus, Heiner. 2017. Teasing
apart coercion and surprisal: Evidence from ERPs and
eye-movements. Cognition 116: 49–59.
Fankhauser, Peter, Knappen, Jörg & Teich, Elke. 2014. Exploring
and visualizing variation in language
resources. In Proceedings of the 9th Language Resources and
Evaluation Conference
(LREC), 4125–4128, Reykjavik, Iceland, May.
Fischer, Stefan, Knappen, Jörg, Menzel, Katrin & Teich, Elke. 2020. The
Royal Society Corpus 6.0. Providing 300+ years of scientific writing for humanistic
study. In Proceedings of the 15th Language Resources and
Evaluation Conference
(LREC), 794–802, Marseille, France, May.
Garg, Nikhil, Schiebinger, Londa, Jurafsky, Dan & Zou, James. 2018. Word
embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the
National Academy of
Sciences 115(16): 3635–3644.
Gray, Bethany & Biber, Douglas. 2018. Academic
writing as a locus of grammatical change: The development of phrasal complexity
features. In Diachronic Corpora, Genre, and Language
Change [Studies in Corpus Linguistics 85], Richard J. Whitt (ed.), 117–146. Amsterdam: John Benjamins.
1988. On the language of
physical science. In Registers of Written English:
Situational Factors and Linguistic Features, Mohsen Ghadessy (ed.), 162–177. London: Pinter.
Halliday, Michael A. K. & Martin, James R. 1993. Writing Science:
Literacy and Discursive Power. London: Falmer Press.
Hamilton, William L., Leskovec, Jure & Jurafsky, Dan. 2016. Cultural
shift or linguistic drift? Comparing two computational models of semantic
change. In Proceedings of the Empirical Methods in Natural
Language Processing
(EMNLP), 2116–2121, Austin, Texas, November.
Harris, Zellig. 1991. A
Theory of Language and Information. A Mathematical
Approach. Oxford: Clarendon Press.
Hawkins, Robert D., Goodman, Noah D., Goldberg, Adele E. & Griffiths, Thomas L. 2020. Generalizing
meanings from partners to populations: Hierarchical inference supports convention formation on
networks. In Proceedings of the 42nd Virtual Annual
Conference of the Cognitive Science Society.
Hilpert, Martin & Gries, Stefan T. 2016. Quantitative
approaches to diachronic corpus linguistics. In The Cambridge
Handbook of English Historical Linguistics, Merja Kytö & Päivi Pahta (eds), 36–53. Cambridge: CUP.
Hilpert, Martin & Mair, Christian. 2015. Grammatical
change. In The Cambridge Handbook of Corpus
Linguistics, Douglas Biber & Randi Reppen (eds), 180–200. Cambridge: CUP.
Kawaguchi, Yuji, Minegishi, Makoto & Viereck, Wolfgang. 2011. Corpus-based
Analysis and Diachronic Linguistics [Tokyo University of Foreign Studies
3]. Amsterdam: John Benjamins.
Kermes, Hannah, Degaetano-Ortlieb, Stefania, Khamis, Ashraf, Knappen, Jörg & Teich, Elke. 2016. The
Royal Society Corpus: From uncharted data to
corpus. In Proceedings of the 10th International Conference
on Language Resources and Evaluation
(LREC), 1928–1931, Portorož, Slovenia, May.
Klingenstein, Sara, Hitchcock, Tim & DeDeo, Simon. 2014. The
civilizing process in London’s Old Bailey. Proceedings of the National Academy of
Sciences 111(26): 9419–9424.
Kopaczyk, Joanna. 2013. The
Legal Language of Scottish Burghs: Standardization and Lexical
Bundles. Oxford: OUP.
Levy, Roger P. & Jaeger, Tim Florian. 2007. Speakers
optimize information density through syntactic
reduction. In Advances in Neural Information Processing
Systems 19, Bernhard Schölkopf, John Platt & Thomas Hoffman (eds), 849–856. Cambridge MA: The MIT Press.
Mair, Christian. 2006. Twentieth-century
English: History, Variation and
Standardization. Cambridge: CUP.
McNamara, Danielle S. 2001. Reading both high and
low coherence texts: Effects of text sequence and prior knowledge. Canadian Journal of
Experimental
Psychology 55(1): 51–62.
McNamara, Danielle S., Kintsch, Eileen, Butler Songer, Nancy & Kintsch, Walter. 1996. Are
good texts always better? Interactions of text coherence, background knowledge, and levels of understanding in
learning from text. Cognition and
Instruction 14(1): 1–43.
Michel, Jean-Baptiste, Shen, Yuan Kui, Presser Aiden, Aviva, Veres, Adrian, Gray, Matthew K., Pickett, Joseph P., Hoiberg, Dale, Clancy, Dan, Norvig, Peter, Orwant, Jon, Pinker, Steven, Nowak, Martin A. & Lieberman Aiden, Erez. 2011. Quantitative
analysis of culture using millions of digitized
books. Science 331(6014): 176–182.
Muralidharan, Aditi & Hearst, Marti A. 2013. Supporting
exploratory text analysis in literature study. Literary and Linguistic
Computing 28(2): 283–295.
Nevalainen, Terttu & Closs Traugott, Elizabeth. 2012. The
Oxford Handbook of the History of
English. Oxford: OUP.
Quirk, Randolph, Greenbaum, Sidney, Leech, Geoffrey & Svartvik, Jan. 1985. A
Comprehensive Grammar of the English
Language. London: Longman.
Rubino, Raphael, Degaetano-Ortlieb, Stefania, Teich, Elke & van Genabith, Josef. 2016. Modeling
diachronic change in scientific writing with information
density. In Proceedings of the 26th International Conference
on Computational Linguistics
(COLING), 750–761, Osaka, Japan, December.
Schulz, Erika, Oh, Yoon Mi, Malisz, Zofia, Andreeva, Bistra & Möbius, Bernd. 2016. Impact
of prosodic structure and information density on vowel space
size. In Proceedings of Speech
Prosody, 350–354, Boston, MA, USA, May.
Sikos, Les, Greenberg, Clayton, Drenhaus, Heiner & Crocker, Matthew. 2017. Information
density of encodings: The role of syntactic variation in
comprehension. In Proceedings of the 39th Annual Conference
of the Cognitive Science
Society, 3168–3173, London, UK, July.
Teich, Elke, Degaetano-Ortlieb, Stefania, Fankhauser, Peter, Kermes, Hannah & Lapshinova-Koltunski, Ekaterina. 2016. The
linguistic construal of disciplinarity: A data mining approach using register
features. Journal of the Association for Information Science and Technology
(JASIST) 67(7): 1668–1678.
