Article published In: International Journal of Corpus Linguistics
Vol. 2:2 (1997) ► pp.199–237
Text Categories and Where You Can Stick Them
A Crude Formality Index
Published online: 1 January 1997
https://doi.org/10.1075/ijcl.2.2.04sig
https://doi.org/10.1075/ijcl.2.2.04sig
This paper applies principal components analysis (PCA) to solve the problem of interpreting pre-existing corpus text categories for analysis of linguistic variation. The method is illustrated by constructing an index of the complex notion "formality " from PCA of a set of high-frequency wordform-based counts. The first principal component from this analysis acts as a broad formality index; a second principal component is tentatively identified as marking "concrete facts" versus "abstract discussion"'. Subsequently, text categories from the corpora are positioned on these textual dimensions, and selected categories are evaluated for internal consistency by comparing the distribution of texts across subcategories. Finally, suggestions are made concerning further developments and applications of the method used here, and its implications for the use of corpora in variation studies.
Keywords: Factor Analysis, Formality, Text Typology
Cited by (14)
Cited by 14 other publications
van Klyton, Aaron, Mary-Paz Arrieta-Paredes, Nicola Palladino & Ayush Soomaree
Dash, Niladri Sekhar & L. Ramamoorthy
Kruger, Haidee & Bertus van Rooy
2018. Register variation in written contact varieties of English. English World-Wide. A Journal of Varieties of English 39:2 ► pp. 214 ff.
Kruger, Haidee & Adam Smith
Li, Haiying, Arthur C. Graesser, Mark Conley, Zhiqiang Cai, Philip I. Pavlik & James W. Pennebaker
Pavlick, Ellie & Joel Tetreault
Ferrero, Paz, Rachel Whittaker & Javier Alda
Ferrero, Paz, Rachel Whittaker & Javier Alda
Paolillo, John C., Jonathan Warren & Breanne Kunz
Sigley, R.
Bauer, Laurie
Paiva, Daniel S. & Roger Evans
[no author supplied]
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
