In:Reference and Identity in Public Discourses
Edited by Ursula Lutzky and Minna Nevala
[Pragmatics & Beyond New Series 306] 2019
► pp. 127–158
“Thanks for the donds”
A corpus linguistic analysis of topic-based communities in the comment section of The Guardian
Published online: 21 October 2019
https://doi.org/10.1075/pbns.306.05keh
https://doi.org/10.1075/pbns.306.05keh
Abstract
In this chapter we use corpus linguistic techniques to analyse reader comments on the website of the UK newspaper The Guardian. Our research is based on a corpus containing over 500,000 articles published on the website from 2007 to 2010, along with all 6.2 million comments made on those articles. We examine the distribution of comments across articles and topics before going on to explore the commenting behaviour of individual people. The Guardian website is a public forum with over 20 million unique visitors per month yet, as we demonstrate, it is an online space where people are able to build distinct sub-communities around particular topics and interact regularly on an individual basis without necessarily knowing one another’s true identity.
Keywords: blogs, comments, journalism, corpus linguistics, collocation, community, addressivity, reference, identity, topic
Article outline
- 1.Introduction
- 2.The relationship between comments and article topic
- 2.1Distribution of comments across Guardian sections
- 2.2Distribution of comments by article topic
- 3.Commenter behaviour
- 3.1Interactions between commenters
- 3.2Topic-based commenter communities
- 3.3The nature of user interactions
- 3.4Argumentation and discussion
- 3.5Terms of address
- 3.6Dond(s): An example of in-group lexis
- 4.Conclusion
Notes References Appendix
References (23)
Aston, Guy, and Lou Burnard. 1997. The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh: Edinburgh University Press.
Bastian, Mathieu, Sebastien Heymann, and Mathieu Jacomy. 2009. “Gephi: an Open Source Software for Exploring and Manipulating Networks.” International AAAI Conference on Weblogs and Social Media.
Boczkowski, Pablo J., and Eugenia Mitchelstein. 2011. “How Users Take Advantage of Different Forms of Interactivity on Online News Sites: Clicking, E-Mailing, and Commenting.” Human Communication Research 38 (1): 1–22.
Crowston, Kevin, and Michelle Williams. 2000. “Reproduced and Emergent Genres of Communication on the World-Wide Web.” The Information Society 16 (3): 201–216.
Davies, Mark. 2008. The Corpus of Contemporary American English. [URL]
Dunning, Ted. 1993. “Accurate Methods for the Statistics of Surprise and Coincidence.” Computational Linguistics 19 (1): 61–74.
Eller, Monika. 2018. “‘no prizes to anybody spotting my typo, by the way’: The Interplay between Criticism and Identity Management in the Comments Sections on Newspaper Websites.” In The Discursive Construction of Identities On- and Offline, ed. by Birte Bös, Sonja Kleinke, Sandra Mollin, and Nuria Hernández, 177–202. Amsterdam: John Benjamins.
Eriksen, Lars B., and Carina Ihlström. 2000. “Evolution of the Web News Genre: The Slow Move beyond the Print Metaphor.” In Proceedings of the Thirty-Third Hawaii International Conference on System Sciences. Los Alamitos, CA: IEEE.
Hermida, Alfred, and Neil Thurman. 2008. “A Clash of Cultures: The Integration of User-Generated Content within Professional Journalistic Frameworks at British Newspaper Websites.” Journalism Practice 2 (3): 343–356.
Herring, Susan C. 1999. “Interactional Coherence in CMC.” Proceedings of the 32nd Hawaii International Conference on System Sciences.
2013. “Discourse in Web 2.0: Familiar, Reconfigured, and Emergent.” In Georgetown University Round Table on Languages and Linguistics 2011: Discourse 2.0: Language and New Media, ed. by Deborah Tannen, and Anna M. Trester, 1–25. Washington, DC: Georgetown University Press.
Hundt, Marianne, and Christian Mair. 1998. “‘Agile’ and ‘Uptight’ Genres. The Corpus-Based Approach to Language Change in Progress.” International Journal of Corpus Linguistics 4 (2): 221–242.
Johansson, Stig, Geoffrey Leech, and Helen Goodluck. 1978. Manual of Information to Accompany the Lancaster-Oslo/Bergen Corpus of British English, for Use with Digital Computers. Oslo: Department of English, University of Oslo.
Kehoe, Andrew, and Matt Gee. 2009. “Weaving Web Data into a Diachronic Corpus Patchwork.” In Corpus Linguistics: Refinements and Reassessments, ed. by Antoinette Renouf, and Andrew Kehoe, 255–279. Amsterdam: Rodopi.
. 2012. “Reader Comments as an Aboutness Indicator in Online Texts: Introducing the Birmingham Blog Corpus.” In Aspects of Corpus Linguistics: Compilation, Annotation, Analysis, ed. by Signe Oksefjell Ebeling, Jarle Ebeling, and Hilde Hasselgård. Helsinki: VARIENG. Available online at: [URL] (accessed 15 April 2019).
Kucera, Henry, and W. Nelson Francis. 1967. Computational Analysis of Present-Day American English. Providence: Brown University Press.
Langlotz, Andreas, and Miriam A. Locher. 2012. “Ways of Communicating Emotional Stance in Online Disagreements.” Journal of Pragmatics 44: 1591–1606.
Lutzky, Ursula, and Matt Gee. 2018. “‘I Just Found Your Blog.’ The Pragmatics of Initiating Comments on Blog Posts.” Journal of Pragmatics 129: 173–184.
Mishne, Gilad, and Natalie Glance. 2006. “Leave a Reply: An Analysis of Weblog Comments.” Third Annual Workshop on the Weblogging Ecosystem (WWW 2006).
OED. Oxford English Dictionary. Second edition, online. Oxford: Oxford University Press. [URL]
Cited by (1)
Cited by one other publication
Kehoe, Andrew, Matt Gee & Antoinette Renouf
2022. A data-driven approach to finding significant changes in language use through time series analysis. In Broadening the Spectrum of Corpus Linguistics [Studies in Corpus Linguistics, 105], ► pp. 285 ff.
This list is based on CrossRef data as of 28 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
