Do registers have different functions for text length?: A case study of Reddit

Liimatta, Aatu

doi:10.1075/rs.22007.lii

Article published In: Register and social media
Edited by Isobelle Clarke and Jack Grieve
[Register Studies 4:2] 2022
► pp. 263–287

Get fulltext from our e-platform

Download PDF

Download EPUB

Do registers have different functions for text length?

A case study of Reddit

Aatu Liimatta | University of Helsinki

Published online: 10 November 2022

https://doi.org/10.1075/rs.22007.lii

Abstract

Similar to lexical and grammatical choices, the length of a text is also guided by situational constraints and functional needs. Consequently, texts of different lengths are associated with different communicative functions. This study explores the role of register in the functions which are associated with comment lengths on the social media platform Reddit. Since registers differ in their functional and situational makeup, the same text length may also have different functions in different registers. By analyzing variation in the frequencies of register features across comment lengths in a number of popular subreddits in a large-scale dataset of Reddit comments, I show that the functional associations of text length can differ greatly between subreddits, and that comments of the same length can even have virtually opposite functions in different subreddits. Furthermore, some subregisters are clearly differentiated not only by their feature makeup but also by the length of their comments.

Keywords: text length, social media, Reddit, register analysis, functional variation

Article outline

1.Introduction
2.Data and Methods
- 2.1Data
- 3.2Methods
  - 2.2.1Text-linguistic register framework
  - 2.2.2Lengthwise analysis
3.Analysis
4.Discussion
5.Conclusion
Notes
References

References (32)

References

Baumgartner, J., Zannettou, S., Keegan, B., Squire, M., & Blackburn, J. (2020). The Pushshift Reddit Dataset. Proceedings of the International AAAI Conference on Web and Social Media, 14(1), 830–839.

Berber Sardinha, T., & Veirano Pinto, M. (2014). Multi-dimensional analysis, 25 years on: A tribute to Douglas Biber. Philadelphia: John Benjamins.

Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.

(1994). An analytical framework for register studies. In D. Biber & E. Finegan (Eds.), Sociolinguistic perspectives on register (pp. 31–56). New York: Oxford University Press.

(2014). Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast, 14(1), 7–34.

Biber, D., & Conrad, S. (2001). Introduction: Multi-dimensional analysis and the study of register variation. In S. Conrad & D. Biber (Eds.), Variation in English: Multi-dimensional studies (pp. 3–12). Harlow: Pearson Education.

(2009). Register, genre, and style. Cambridge: Cambridge University Press.

Biber, D., Csomay, E., Jones, J. K., & Keck, C. (2004). A corpus linguistic investigation of vocabulary-based discourse units in university registers. In U. Connor & T. A. Upton (Eds.), Applied Corpus Linguistics: A Multidimensional Perspective (pp. 53–72). Rodopi.

Biber, D., & Egbert, J. (2016). Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics, 44(2), 95–137.

(2018). Register variation online. Cambridge: Cambridge University Press.

Biber, D., Egbert, J., & Davies, M. (2015). Exploring the composition of the searchable web: A corpus-based taxonomy of web registers. Corpora, 10(1), 11–45.

Biber, D., Egbert, J., & Keller, D. (2020). Reconceptualizing register in a continuous situational space. Corpus Linguistics and Linguistic Theory, 16(3), 581–616.

Biber, D., & Gray, B. (2013). Being specific about historical change: The influence of sub-register. The Journal of English Linguistics, 411, 104–134.

Biber, D., & Kurjian, J. (2007). Towards a taxonomy of web registers and text types: A multi-dimensional analysis. In M. Hundt, N. Nesselhauf, & C. Biewer (Eds.), Corpus linguistics and the web (pp. 109–132). Amsterdam: Rodopi.

Clarke, I., & Grieve, J. (2017). Dimensions of abusive language on Twitter. In Z. Waseem, W. Hui Kyong, D. Hovy, & J. Tetreault (Eds.), Proceedings of the first workshop on abusive language online (pp. 1–10). Vancouver: Association for Computational Linguistics.

(2019). Stylistic variation on the Donald Trump Twitter account: A linguistic analysis of tweets posted between 2009 and 2018. PLoS ONE, 14(9).

Conrad, S., & Biber, D. (Eds.). (2001). Variation in English: Multi-dimensional studies. Harlow: Pearson Education.

Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type-token ratio (MATTR). Journal of Quantitative Linguistics, 17(2), 94–100.

Egbert, J., Biber, D., & Davies, M. (2015). Developing a bottom-up, user-based method of web register classification. Journal of the Association for Information Science and Technology, 66(9), 1817–1831.

Friginal, E. (Ed.) (2013). Twenty-five ears of Biber’s multi-dimensional analysis [Special issue]. Corpora, 8(2).

Grice, P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Speech acts (pp. 41–58). New York: Academic press.

Grieve, J., Biber, D., Friginal, E., & Nekrasova, T. (2011). Variation among blog text types: A multi-dimensional analysis. In A. Mehler, S. Sharoff, & M. Santini (Eds.), Genres on the web: Corpus studies and computational models (pp. 302–322). New York: Springer.

Hess, C. W., Haug, H. T., & Landry, R. G. (1989). The reliability of type-token ratios for the oral language of school age children. Journal of Speech and Hearing Research, 321, 536–540.

Hess, C. W., Sefton, K. M., & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech and Hearing Research, 291, 129–134.

Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System, 40(4), 554–564.

Kubát, M., & Milička, J. (2013). Vocabulary richness measure in genres. Journal of Quantitative Linguistics, 20(4), 339–349.

Liimatta, A. (2019). Exploring register variation on Reddit: A multi-dimensional study of language use on a social media website. Register Studies, 1(2), 269–295.

(2020). Using lengthwise scaling to compare feature frequencies across text lengths on Reddit. In S. Rüdiger & D. Dayter (Eds.), Corpus approaches to social media (pp. 111–130). Amsterdam/Philadelphia: John Benjamins.

(2022). Register variation across text lengths: Evidence from social media. International Journal of Corpus Linguistics.

Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 55–60).

Shi, Y., & Lei, L. (2020). Lexical richness and text length: An entropy-based perspective. Journal of Quantitative Linguistics, 29(1), 62–79.

Titak, A., & Roberson, A. (2013). Dimensions of web registers: An exploratory multi-dimensional comparison. Corpora, 8(2), 239–271.

Cited by (5)

Cited by five other publications

Order by:

Erten-Johansson, Selcen & Veronika Laippala

2025. 33Utilizing Text Dispersion Keyness on Turkish web registers: The case of Informational Description and Opinion. In Exploring digitally-mediated communication with corpora, ► pp. 33 ff.

Messerli, Thomas C, Daria Dayter, Sven Leuckert, Aatu Liimatta, Hanna Mahler, Axel Bohmann, Gustavo Kozma & Rafaela Tosin

2025. Digital debating cultures: communicative practices on Reddit. Digital Scholarship in the Humanities 40:1 ► pp. 227 ff.

Erten-Johansson, Selcen, Valtteri Skantsi, Sampo Pyysalo & Veronika Laippala

2024. Linguistic variation beyond the Indo-European web. Register Studies 6:1 ► pp. 60 ff.

Hiltunen, Turo

2024. Early newspapers as data for corpus linguistics (and Digital Humanities). In Challenges in corpus linguistics [Studies in Corpus Linguistics, 118], ► pp. 68 ff.

Liimatta, Aatu

2024. Text length and short texts. In Challenges in corpus linguistics [Studies in Corpus Linguistics, 118], ► pp. 106 ff.

This list is based on CrossRef data as of 30 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.