Article published In: International Journal of Corpus Linguistics
Vol. 21:2 (2016) ► pp.165–191
Your blog is (the) shit
A corpus linguistic approach to the identification of swearing in computer mediated communication
Published online: 8 September 2016
https://doi.org/10.1075/ijcl.21.2.02lut
https://doi.org/10.1075/ijcl.21.2.02lut
The study of swearing has increased in the last decade, diversifying to include a wider range of data and methods of analysis. Nevertheless, certain types of data and specifically large corpora of computer mediated communication (CMC) have not been studied extensively. In this paper, we fill a gap in research by studying the use of swearwords in blog data, and illustrate ways of identifying swearing in a large corpus by taking context into account. This approach, based on the examination of shared and unique collocates of known expletives, facilitates the distinction of attestations of swearing from non-swearing in the case of polysemous lexemes, and the analysis of overlaps in usage and meaning of swearwords. This work therefore goes beyond basic sentiment analysis and offers new insights into the use of collocation for refining profanity filters, providing innovative perspectives on issues of growing importance as online interaction becomes more widespread.
Keywords: swearing, collocation, CMC, blogs, pragmatics
References (37)
Aijmer, K., & Rühlemann, C. (Eds.) (2014) Corpus Pragmatics. A Handbook. Cambridge: Cambridge University Press.
Angouri, J., & Tseliga, T. (2010). “you HAVE NO IDEA WHAT YOU ARE TALKING ABOUT!” From e-disagreement to e-impoliteness in two online fora. Journal of Politeness Research, 6(1), 57–82.
Archer, D., Culpeper, J., & Davies, M. (2008). Pragmatic annotation. In A. Lüdeling & M. Kytö (Eds.), Corpus Linguistics: An International Handbook (pp. 613–641). Berlin: Mouton de Gruyter.
Beers Fägersten, K. (2012). Who’s Swearing Now? The Social Aspects of Conversational Swearing. Newcastle upon Tyne: Cambridge Scholars Publishing.
boyd, d. (2006). A blogger’s blog: Exploring the definition of a medium. Reconstruction, 6(4). Retrieved from [URL] (last accessed February 2016).
British National Corpus (BNC), XML Edition. (2007). Distributed by Oxford University Computing Services on behalf of the BNC Consortium.
Butler, C.W., & Fitzgerald, R. (2011). “My f***ing personality”: Swearing as slips and gaffes in live television broadcasts. Text & Talk, 31(5), 525–551.
Crystal, D. (1997). The Cambridge Encyclopedia of Language (2nd ed.). Cambridge: Cambridge University Press.
Hardaker, C. (2010). Trolling in asynchronous computer-mediated communication: From user discussions to academic definitions. Journal of Politeness Research, 6(2), 215–242.
Haugh, M. (2010). When is an email really offensive?: Argumentativity and variability in evaluations of impoliteness. Journal of Politeness Research, 6(1), 7–31.
Herring, S.C., Scheidt, L.A., Wright, E., & Bonus, S. (2005). Weblogs as a bridging genre. Information Technology and People, 18(2), 142–171.
Hughes, G. (1998). Swearing: A Social History of Foul Language, Oaths and Profanity in English. Oxford: Blackwell.
Jay, T., & Janschewitz, K. (2008). The pragmatics of swearing. Journal of Politeness Research, 41, 267–88.
Jucker, Andreas H., Schreier, D., & Hundt, M. (2009). Corpus linguistics, pragmatics and discourse. In A.H. Jucker, D. Schreier & M. Hundt (Eds.), Corpora: Pragmatics and Discourse. Papers from the 29th International Conference on English Language Research on Computerized Corpora (ICAME 29) (pp. 3–9). Amsterdam: Rodopi.
Jucker, Andreas H. (2013). Corpus pragmatics. In J.-O. Östman & J. Verschueren (Eds.), Handbook of Pragmatics (pp. 1–18). Amsterdam: John Benjamins.
Kehoe, A. (2006). Diachronic linguistic analysis on the web using WebCorp. In A. Renouf & A. Kehoe (Eds.), The Changing Face of Corpus Linguistics (pp. 297–307). Amsterdam: Rodopi.
Kehoe, A., & Gee, M. (2007). New corpora from the web: Making web text more “text-like”. In P. Pahta, I. Taavitsainen, T. Nevalainen & J. Tyrkkö (Eds.), Studies in Variation, Contacts and Change in English 2: Towards Multimedia in Corpus Studies. VARIENG E-journal. Helsinki: University of Helsinki. Retrieved from [URL] (last accessed February 2016).
. (2012). Reader comments as an aboutness indicator in online texts: Introducing the Birmingham Blog Corpus. In S. Oksefjell Ebeling, J. Ebeling & H. Hasselgård (Eds.), Studies in Variation, Contacts and Change in English 12: Aspects of Corpus Linguistics: Compilation, Annotation, Analysis. Proceedings of ICAME 32, VARIENG E-journal. Helsinki: University of Helsinki. Retrieved from [URL] (last accessed February 2016).
Koch, P. (1999). Court records and cartoons. Reflections of spontaneous dialogue in early Romance texts. In A.H. Jucker, G. Fritz & F. Lebsanft (Eds.), Historical Dialogue Analysis (pp. 399–429). Amsterdam: John Benjamins.
Ljung, M. (2009). The functions of expletive interjections in spoken English. In A. Renouf & A. Kehoe (Eds.), Corpus Linguistics: Refinements & Reassessments (pp. 155–171). Amsterdam: Rodopi.
McEnery, A. (2006). Swearing in English. Bad Language, Purity and Power from 1586 to the Present. London: Routledge.
McEnery, A., Baker, J.P., & Hardie, A. (2000a). Assessing claims about language use with corpus data – swearing and abuse. In J. Kirk (Ed.), Corpora Galore: Analyses and Techniques in Describing English (pp. 45–55). Amsterdam: Rodopi.
. (2000b). Swearing and abuse in Modern British English. In B. Lewandowska-Tomaszczyk & P.J. Melia (Eds.), PALC’99: Practical Applications in Language Corpora (pp. 37–48). Berlin: Peter Lang.
McEnery, A., & Xiao, Z. (2004). Swearing in Modern British English: The case of fuck in the BNC. Language and Literature, 13(3), 235–268.
Mishne, G., & Glance, N. (2006). Leave a reply: An analysis of weblog comments. Third Annual Workshop on the Weblogging Ecosystem (WWW 2006).
Nardi, B.A., Schiano, D.J., Gumbrecht, M., & Swartz, L. (2004). Why we blog. Communications of the ACM, 47(12), 41–46.
Nigam, K., & Hurst, M. (2004). Towards a robust metric of opinion. In
Proceedings of the AAAI Spring Symposium on Exploring Attitude and Affect in Text
. Retrieved from [URL] (last accessed February 2016).
Renouf, A. (1996). The ACRONYM project: Discovering the textual thesaurus. In I. Lancashire, C. Meyer & C. Percy (Eds.), Synchronic Corpus Linguistics: Papers from English Language Research on Computerized Corpora (ICAME 16) (pp. 171–187). Amsterdam: Rodopi.
Renouf, A., & Bauer, L. (2001). Contextual clues to word-meaning. International Journal of Corpus Linguistics, 5(2), 231–258.
Renouf, A., & Kehoe, A. (2013). Filling the gaps: Using the WebCorp Linguist’s Search Engine to supplement existing text resources. International Journal of Corpus Linguistics, 18(2), 167–198.
Cited by (18)
Cited by 18 other publications
Hsu, Chan-Chia, Yu-Yun Chang & Yun Biao
Kopf, Susanne
Leuckert, Sven & Claudia Lange
Abdel-Raheem, Ahmed
Abdel-Raheem, Ahmed
Dynel, Marta
Jucker, Andreas H. & Daniela Landert
Hsu, Chan-Chia
Coats, Steven
Love, Robbie
Vidgen, Bertie, Leon Derczynski & Natalia Grabar
Limatius, Hanna
Bednarek, Monika
2019. Chapter 2. The multifunctionality of swear/taboo words in television series. In Emotion in Discourse [Pragmatics & Beyond New Series, 302], ► pp. 29 ff.
Bednarek, Monika
Kopf, Susanne & Elena Nichele
2018. Es-tu Charlie?. In Doing Politics [Discourse Approaches to Politics, Society and Culture, 80], ► pp. 211 ff.
Lutzky, Ursula & Matt Gee
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
