Article published In: Register Studies
Vol. 1:2 (2019) ► pp.269–295
Exploring register variation on Reddit
A multi-dimensional study of language use on a social media website
Published online: 25 September 2019
https://doi.org/10.1075/rs.18005.lii
https://doi.org/10.1075/rs.18005.lii
Abstract
While the language of the internet has been an increasingly popular research topic, there remain many understudied areas and topics which deserve more attention. This study explores register variation within the social media website Reddit using the multi-dimensional approach developed by Douglas Biber. Reddit, the third most popular English-language social media website after the giants Facebook and Twitter, is made up of thousands of user-created ‘subreddits’, subcommunities centered around different topics, where users make posts and comment on them. Many different communities and topic areas under one roof makes Reddit a particularly fruitful source of research material. In this paper, three register dimensions are extracted from data collected over one month from a group of thirty-seven subreddits: ‘On-line Subjective Production’, ‘Informational Style’ and ‘Instructional Focus’. These dimensions describe register variation within Reddit in meaningful ways. They are also in line with suggested register universals ( (2014). Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast, 14(1), 7–34. ).
Keywords: Reddit, social media, internet, multi-dimensional analysis
Article outline
- 1.Introduction
- 2.Multi-dimensional register studies
- 3.Overview of the data
- 4.Factor analysis and dimension scores
- 5.Interpretation of the dimensions
- 5.1Dimension 1: On-line subjective production
- 5.2Dimension 2: Informational style
- 5.3Comparing dimensions 1 and 2
- 5.4Dimension 3: Instructional focus
- 6.Discussion
- 7.Conclusion
References
References (42)
Berber Sardinha, T. (2014). Comparing internet and pre-internet registers. In T. Berber-Sardinha & M. Veirano-Pinto (Eds.), Multi-dimensional analysis, 25 years on: A tribute to Douglas Biber (pp. 81–105). Amsterdam: John Benjamins.
Berber Sardinha, T., & Veirano Pinto, M. (2014). Multi-dimensional analysis, 25 years on: A tribute to Douglas Biber. Amsterdam: John Benjamins.
(2014). Using multi-dimensional analysis to explore cross-linguistic universals of register variation. Languages in Contrast, 14(1), 7–34.
Biber, D., & Egbert, J. (2015). Using grammatical features for automatic register identification in an unrestricted corpus of documents from the open web. Journal of Research Design and Statistics in Linguistics and Communication Science, 2(1), 3–36.
(2016). Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics, 44(2), 95–137.
Biber, D., & Gray, B. (2013). Being specific about historical change: The influence of sub-register. The Journal of English Linguistics, 411, 104–134.
Biber, D., & Kurjian, J. (2007). Towards a taxonomy of web registers and text types: A multi-dimensional analysis. In M. Hundt, N. Nesselhauf, & C. Biewer (Eds.), Corpus Linguistics and the Web (pp. 109–132). Amsterdam: Rodopi.
Chandrasekharan, E., Pavalanathan, U., Srinivasan, A., Glynn, A., Eisenstein, J., & Gilbert, E. (2017). You can’t stay here: The effectiveness of Reddit’s 2015 ban through the lens of hate speech. Proceedings of the ACM on Human-Computer Interaction, 11.
Cole, J. R., Ghafurian, M., & Reitter, D. (2017, November 13). Is word adoption a grassroots process? An analysis of Reddit communities. In D. Lee, Y. R. Osgood, & R. Thomson (Eds.), International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction and Behavior Representation in Modeling and Simulation (pp. 236–241). Berlin: Springer.
Collot, M., & Belmore, N. (1996). Electronic language: A new variety of English. In S. C. Herring (Ed.), Computer-mediated communication (pp. 13–28). Amsterdam/Philadelphia: John Benjamins.
Conrad, S., & Biber, D. (Eds.). (2001). Variation in English: Multi-dimensional studies. Harlow: Pearson Education.
Coscia, M. (2018). Popularity spikes hurt future chances for viral propagation of protomemes. Communications of the ACM, 61(1), 70–77.
Davies, M. (2016). Corpus of Online Registers of English (CORE). Available from <[URL]>
De Choudhury, M., & De, S. (2015). Mental health discourse on Reddit: Self-disclosure, social support, and anonymity. In Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM 2014 (pp. 71–80).
Egbert, J., Biber, D., & Davies, M. (2015). Developing a bottom-up, user-based method of web register classification. Journal of the Association for Information Science and Technology, 66(9), 1817–1831.
Eisenstein, J. (2013). What to do about bad language on the internet. In Proceedings of the North American chapter of the Association for Computational Linguistics (NAACL) 2013 (pp. 359–369).
Finlay, S. C. (2014). Age and gender in Reddit commenting and success. Journal of Information Science Theory and Practice, 2(3), 18–28.
Friginal, E. (2013). Twenty-five years of Biber’s multi-dimensional analysis [Special Issue]. Corpora, 8(2).
Gkotsis, G., Oellrich, A., Hubbard, T., & Dobson, R. (2016). The language of mental health problems in social media. In Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology (pp. 63–73). Stroudsburg, PA: Association for Computational Linguistics.
Grieve, J., Biber, D., Friginal, E., & Nekrasova, T. (2011). Variation among blog text types: A multi-dimensional analysis. In A. Mehler, S. Sharoff, & M. Santini (Eds.), Genres on the web: Corpus studies and computational models (pp. 302–322). New York, NY: Springer.
Haralabopoulos, G., Anagnostopoulos, I., & Zeadally, S. (2015). Lifespan and propagation of information in on-line social networks: A case study based on Reddit. Journal of Network and Computer Applications, 561, 88–100.
Hardy, J., & Friginal, E. (2012). Filipino and American online communication and linguistic variation. World Englishes, 31(1), 1–19.
Hess, C. W., Haug, H. T., & Landry, R. G. (1989). The reliability of type-token ratios for the oral language of school age children. Journal of Speech and Hearing Research, 321, 536–540.
Hess, C. W., Sefton, K. M., & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech and Hearing Research, 291, 129–134.
Huang, Y., Guo, D., Kasakoff, A., & Grieve, J. (2016). Understanding US regional linguistic variation with Twitter data analysis. Computers, Environment and Urban systems, 591, 244–255.
Jonsson, E. (2015). Conversational writing: A multidimensional study of synchronous and supersynchronous computer-mediated communication. Frankfurt: Peter Lang.
Literat, I., & van den Berg, S. (2017). Buy memes low, sell memes high: vernacular criticism and collective negotiations of value on Reddit’s MemeEconomy. Information, Communication & Society.
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 55–60).
McEwan, B. (2016). Communication of communities: Linguistic signals of online groups. Information, Communication & Society, 19(9), 1233–1249.
Munro, R., & Manning, C. D. (2012). Short message communications: Users, topics, and in-language processing. In ACM DEV ’12 Proceedings of the 2nd ACM Symposium on Computing for Development.
Park, A., & Conway, M. (2018). Examining thematic similarity, difference, and membership in three online mental health communities from Reddit: A text mining and visualization approach. Computers in Human Behavior, 781, 98–112.
Pavalanathan, U., Fitzpatrick, J., Kiesling, S. F., & Eisenstein, J. (2017). A multidimensional lexicon for interpersonal stancetaking. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (pp. 884–895).
Revelle, W. (2017). psych: Procedures for psychological, psychometric, and personality research (Version 1.7.5). Illinois, USA: Northwestern University. Retrieved from <[URL]>
Richterich, A. (2014). ‘Karma, precious karma!’ Karmawhoring on Reddit and the front page’s econometrisation. Journal of Peer Production, 41. Retrieved from <[URL]>
Schnoebelen, T. (2012). Do you smile with your nose? Stylistic variation in Twitter emoticons. University of Pennsylvania Working Papers in Linguistics, 18(2), 115–125.
Singer, P., Ferrara, E., Kooti, F., Strohmaier, M., & Lerman, K. (2016). Evidence of online performance deterioration in user sessions on Reddit. PLoS ONE, 11(8).
Stewart, I., & Eisenstein, J. (2018, February 21). Making “fetch” happen: The influence of social and linguistic context on the success of lexical innovations. arXiv:17091.00345v3 [cs.CL].
Titak, A., & Roberson, A. (2013). Dimensions of web registers: An exploratory multi-dimensional comparison. Corpora, 8(2), 239–271.
Cited by (18)
Cited by 18 other publications
Coussé, Evie & Yvonne Adesam
Erten-Johansson, Selcen & Veronika Laippala
Frenken, Florian
Grieve, Jack, Sara Bartl, Matteo Fuoli, Jason Grafmiller, Weihang Huang, Alejandro Jawerbaum, Akira Murakami, Marcus Perlman, Dana Roemling & Bodo Winter
Schneider, Carolin
Dixon, Daniel H.
2024. Measuring the linguistic similarity of discourse from open-world role-playing games to the real world through an additive multidimensional analysis. Register Studies 6:1 ► pp. 1 ff.
Erten-Johansson, Selcen, Valtteri Skantsi, Sampo Pyysalo & Veronika Laippala
Tao, Xuelian & Vahid Aryadoust
Berber Sardinha, Tony
Biri, Ylva
Biri, Ylva
Biri, Ylva
2024. Personal conviction against general knowledge. In Self- and Other-Reference in Social Contexts [Pragmatics & Beyond New Series, 342], ► pp. 14 ff.
Ehret, Katharina & Maite Taboada
Liimatta, Aatu
2020. Using lengthwise scaling to compare feature frequencies across text lengths on Reddit. In Corpus approaches to social media [Studies in Corpus Linguistics, 98], ► pp. 111 ff.
Liimatta, Aatu
Liimatta, Aatu
2023. Register variation across text lengths. International Journal of Corpus Linguistics 28:2 ► pp. 202 ff.
Liimatta, Aatu
2024. Text length and short texts. In Challenges in corpus linguistics [Studies in Corpus Linguistics, 118], ► pp. 106 ff.
This list is based on CrossRef data as of 30 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
