Article published In: Register and social media
Edited by Isobelle Clarke and Jack Grieve
[Register Studies 4:2] 2022
► pp. 263–287
Do registers have different functions for text length?
A case study of Reddit
Published online: 10 November 2022
https://doi.org/10.1075/rs.22007.lii
https://doi.org/10.1075/rs.22007.lii
Abstract
Similar to lexical and grammatical choices, the length of a text is also guided by situational constraints and functional needs. Consequently, texts of different lengths are associated with different communicative functions. This study explores the role of register in the functions which are associated with comment lengths on the social media platform Reddit. Since registers differ in their functional and situational makeup, the same text length may also have different functions in different registers. By analyzing variation in the frequencies of register features across comment lengths in a number of popular subreddits in a large-scale dataset of Reddit comments, I show that the functional associations of text length can differ greatly between subreddits, and that comments of the same length can even have virtually opposite functions in different subreddits. Furthermore, some subregisters are clearly differentiated not only by their feature makeup but also by the length of their comments.
Keywords: text length, social media, Reddit, register analysis, functional variation
Article outline
- 1.Introduction
- 2.Data and Methods
- 2.1Data
- 3.2Methods
- 2.2.1Text-linguistic register framework
- 2.2.2Lengthwise analysis
- 3.Analysis
- 4.Discussion
- 5.Conclusion
- Notes
References
References (32)
Baumgartner, J., Zannettou, S., Keegan, B., Squire, M., & Blackburn, J. (2020). The Pushshift Reddit Dataset. Proceedings of the International AAAI Conference on Web and Social Media, 14(1), 830–839.
Berber Sardinha, T., & Veirano Pinto, M. (2014). Multi-dimensional analysis, 25 years on: A tribute to Douglas Biber. Philadelphia: John Benjamins.
(1994). An analytical framework for register studies. In D. Biber & E. Finegan (Eds.), Sociolinguistic perspectives on register (pp. 31–56). New York: Oxford University Press.
(2014). Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast, 14(1), 7–34.
Biber, D., & Conrad, S. (2001). Introduction: Multi-dimensional analysis and the study of register variation. In S. Conrad & D. Biber (Eds.), Variation in English: Multi-dimensional studies (pp. 3–12). Harlow: Pearson Education.
Biber, D., Csomay, E., Jones, J. K., & Keck, C. (2004). A corpus linguistic investigation of vocabulary-based discourse units in university registers. In U. Connor & T. A. Upton (Eds.), Applied Corpus Linguistics: A Multidimensional Perspective (pp. 53–72). Rodopi.
Biber, D., & Egbert, J. (2016). Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics, 44(2), 95–137.
Biber, D., Egbert, J., & Davies, M. (2015). Exploring the composition of the searchable web: A corpus-based taxonomy of web registers. Corpora, 10(1), 11–45.
Biber, D., Egbert, J., & Keller, D. (2020). Reconceptualizing register in a continuous situational space. Corpus Linguistics and Linguistic Theory, 16(3), 581–616.
Biber, D., & Gray, B. (2013). Being specific about historical change: The influence of sub-register. The Journal of English Linguistics, 411, 104–134.
Biber, D., & Kurjian, J. (2007). Towards a taxonomy of web registers and text types: A multi-dimensional analysis. In M. Hundt, N. Nesselhauf, & C. Biewer (Eds.), Corpus linguistics and the web (pp. 109–132). Amsterdam: Rodopi.
Clarke, I., & Grieve, J. (2017). Dimensions of abusive language on Twitter. In Z. Waseem, W. Hui Kyong, D. Hovy, & J. Tetreault (Eds.), Proceedings of the first workshop on abusive language online (pp. 1–10). Vancouver: Association for Computational Linguistics.
(2019). Stylistic variation on the Donald Trump Twitter account: A linguistic analysis of tweets posted between 2009 and 2018. PLoS ONE, 14(9).
Conrad, S., & Biber, D. (Eds.). (2001). Variation in English: Multi-dimensional studies. Harlow: Pearson Education.
Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type-token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100.
Egbert, J., Biber, D., & Davies, M. (2015). Developing a bottom-up, user-based method of web register classification. Journal of the Association for Information Science and Technology, 66(9), 1817–1831.
Friginal, E. (Ed.) (2013). Twenty-five ears of Biber’s multi-dimensional analysis [Special issue]. Corpora, 8(2).
Grice, P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Speech acts (pp. 41–58). New York: Academic press.
Grieve, J., Biber, D., Friginal, E., & Nekrasova, T. (2011). Variation among blog text types: A multi-dimensional analysis. In A. Mehler, S. Sharoff, & M. Santini (Eds.), Genres on the web: Corpus studies and computational models (pp. 302–322). New York: Springer.
Hess, C. W., Haug, H. T., & Landry, R. G. (1989). The reliability of type-token ratios for the oral language of school age children. Journal of Speech and Hearing Research, 321, 536–540.
Hess, C. W., Sefton, K. M., & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech and Hearing Research, 291, 129–134.
Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 40(4), 554–564.
Kubát, M., & Milička, J. (2013). Vocabulary richness measure in genres. Journal of Quantitative Linguistics, 20(4), 339–349.
Liimatta, A. (2019). Exploring register variation on Reddit: A multi-dimensional study of language use on a social media website. Register Studies, 1(2), 269–295.
(2020). Using lengthwise scaling to compare feature frequencies across text lengths on Reddit. In S. Rüdiger & D. Dayter (Eds.), Corpus approaches to social media (pp. 111–130). Amsterdam/Philadelphia: John Benjamins.
(2022). Register variation across text lengths: Evidence from social media. International Journal of Corpus Linguistics.
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 55–60).
Cited by (5)
Cited by five other publications
Erten-Johansson, Selcen & Veronika Laippala
Messerli, Thomas C, Daria Dayter, Sven Leuckert, Aatu Liimatta, Hanna Mahler, Axel Bohmann, Gustavo Kozma & Rafaela Tosin
Erten-Johansson, Selcen, Valtteri Skantsi, Sampo Pyysalo & Veronika Laippala
Hiltunen, Turo
2024. Early newspapers as data for corpus linguistics (and Digital
Humanities). In Challenges in corpus linguistics [Studies in Corpus Linguistics, 118], ► pp. 68 ff.
Liimatta, Aatu
2024. Text length and short texts. In Challenges in corpus linguistics [Studies in Corpus Linguistics, 118], ► pp. 106 ff.
This list is based on CrossRef data as of 30 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
