Article published In: Pragmatics
Vol. 21:4 (2011) ► pp.647–683
Detecting contrast patterns in newspaper articles by combining discourse analysis and text mining
Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.
Published online: 1 December 2011
https://doi.org/10.1075/prag.21.4.07pol
https://doi.org/10.1075/prag.21.4.07pol
Text mining aims at constructing classification models and finding interesting patterns in large text collections. This paper investigates the utility of applying these techniques to media analysis, more specifically to support discourse analysis of news reports about the 2007 Kenyan elections and post-election crisis in local (Kenyan) and Western (British and US) newspapers. It illustrates how text mining methods can assist discourse analysis by finding contrast patterns which provide evidence for ideological differences between local and international press coverage. Our experiments indicate that most significant differences pertain to the interpretive frame of the news events: whereas the newspapers from the UK and the US focus on ethnicity in their coverage, the Kenyan press concentrates on sociopolitical aspects.
Keywords: Text mining, Kenyan elections, Ideology, Pragmatics, Discourse analysis
References (74)
Baker, P., C. Gabrielatos, M. Khosravinik, M. Krzyzanowski, T. McEnery, and R. Wodak (2008) A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse Society 19.3: 273–306. BoP
Balahur, A., and R. Steinberger (2009) Rethinking sentiment analysis in the news: From theory to practice and back. In
Proceedings of the 1st Workshop on Opinion Mining and Sentiment Analysis
, Satellite to CAEPIA 2009.
Bell, A. (1991) The Language of News Media. Oxford: Blackwell. BoP
Cendrowska, J. (1987) PRISM: An algorithm for inducing modular rules. International Journal of Man- Machine Studies 27.4: 349–370.
Cohen, W. (1995) Fast effective rule induction. In
Proceedings of the 12th International Conference on Machine Learning
, p. 115–123.
Cohen, W., and Y. Singer (1999) Context-sensitive learning methods for text categorization. ACM Transactions on Information Systems (TOIS) 17.2: 141–173.
Daelemans, W., S. Bucholz, and J. Veenstra (1999) Memory-based shallow parsing. In
Proceedings of the Computational Natural Language Learning Workshop (CoNLL-99). Demo: [URL]
EU EOM Kenya (2008) Kenya: Final Report. General Elections 27 December 2007 (3 April 2008). Brussel: EU EOM Kenya, retrieved from [URL] [01/03/2010].
Fairclough, N. (1995). Media Discourse. London: Arnold. BoP
Fayyad, U., G. Piatetsky-Shapiro, and P. Smyth (1996) The KDD process for extracting useful knowledge from volumes of data. Communication of the ACM 39. 11: 27–34.
Feldman, R., and J. Sanger (2007) The Text Mining Handbook. Advanced Approaches in Analyzing Unstructured Data. New York: Cambridge University Press.
Finn, A., and N. Kushmerick (2006) Learning to classify documents according to genre. In Journal of the American Society for Information Science and Technology 57.11: 1506–1518.
Fortuna, B., C. Galleguillos, and N. Cristianini (2009) Detecting the bias in media with statistical learning methods. In N. Ashok, Srivastava and M. Saham (eds.), Text Mining: Theory and Applications. London: Taylor and Francis Publisher.
Fortuna, B., M. Grobelnik, and D. Mladenić (2006) System for semi-automatic ontology construction. In
Proceedings of the Demo Session at European Semantic Web Conference ESWC
(2006).
(2007) OntoGen: Semi-automatic ontology editor. In M.J. Smith, and G. Salvendy (eds.),
Proceedings of Human Interface, Part II, HCI International 2007, LNCS 4558, Springer, p. 309–318.
Galtung, J., and M.H. Ruge (1965) The structure of foreign news: The presentation of the Congo, Cuba and Cyprus crises in four Norwegian newspapers. Journal of Peace Research 2.1: 64–91.
Gibbs, G.R. (2004) Computer-assisted Qualitative Data Analysis (CAQDAS). In M.S. Lewis-Beck, A. Bryman, and T.F. Liao (eds.), The Sage Encyclopedia of Social Science Research Methods (11). Thousand Oaks: Sage, p. 87–89.
Greevy, E.P., and A.F. Smeaton (2004) Text categorisation of racist texts using a support vector machine. In
Proceedings of 7es Journées internationales d’Analyse statistique des Données Textuelles JADT (1)
. Leuven: PUL, p. 533–544.
Harris, R.J. (2004) A Cognitive Psychology of Mass Communication (4th ed.) Mahwah: Lawrence Erlbaum.
Kennedy, G. (1998) An Introduction to Corpus Linguistics. London: Longman. TSB
Koller, V., and G. Mautner (2004) Computer applications in critical discourse analysis. In C. Coffin, A. Hewings, and K. O'Halloran (eds.), Applying English Grammar: Functional and Corpus Approaches. London: Arnold, p. 216–228.
Krishnamurty, R. (1996) Ethnic, racial and tribal: The language of racism? In C.R. Caldas-Coulthard, and M. Coulthard (eds.), Texts and Practices: Readings in Critical Discourse Analysis. London/New York: Routledge, p. 129–149. BoP
Lee, C., J.M. Chan, Z. Pan, and C.Y.K. So (2000) National prisms of a global 'Media Event'. In J. Curran, and M. Gurevitch (eds.), Mass Media and Society (3rd ed.). London: Arnold., p. 295–309.
Lin, W.-H., E. Xing, and A. Hauptmann (2008) A joint topic and perspective model for ideological discourse. In
Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
, p. 17–32.
Lindlof, T.R., and B.C. Taylor (2011) Qualitative Communication Research Methods (3rd ed.). Thousand Oaks: Sage.
Liu, S.-Z., and H.-P. Hu (2007) Text classification using sentential frequent item sets. In Journal of Computer Science and Technology 22.2. Beijing: Institute of Computing Technology, p. 334–337.
Lüdeling, A., and M. Kytö (eds.) (2008) Corpus Linguistics. An International Handbook. Berlin: Mouton de Gruyter.
Luyckx, K. (2010) Scalability Issues in Authorship Attribution. Brussels: UPA University Press Antwerp.
Luyckx, K., and W. Daelemans (2008) Authorship attribution and verification with many authors and limited data. In
Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), p. 513–520.
Machin, D. (2008) News discourse I: Understanding the social goings-on behind news texts. In A. Mayr (ed.), Language and Power: An Introduction to Institutional Discourse. London: Continuum, p. 62–89.
MacMillan, K. (2005) More than just coding? Evaluating CAQDAS in a discourse analysis of news texts. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research 6.3, art. 25.
Mahlberg, M. (2007) Lexical items in discourse: Identifying local textual functions of sustainable development. In M. Hoey, M. Mahlberg, M. Stubbs, and W. Teubert (eds.), Text, Discourse and Corpora. Theory and Analysis. London/New York: Continuum, p. 191–218.
Matu, P.M., and H.J. Lubbe (2007) Investigating language and ideology: A presentation of the ideological square and transitivity in the editorials of three Kenyan newspapers. Journal of Language and Politics 6.3: 401–418.
Mautner, G. (2007) Mining large corpora for social information: The case of elderly. Language in Society 36.1: 51–72. BoP
McGee, M.C. (1980) The ‘ideograph’: A link between rhetoric and ideology. The Quarterly Journal of Speech 66.1: 1–16.
Morley, J., and P. Bayley (2009) Corpus-Assisted Discourse Studies on the Iraq Conflict: Wording the War. New York: Routledge. BoP
Ngonyani, D. (2000) Tools of deception: Media coverage of student protests in Tanzania. Nordic Journal of African Studies 9.2: 22–48.
Ogola, G. (2009) Media at cross-roads: Reflections on the Kenyan news media and the coverage of the 2007 political crisis. Africa Insight 39.1: 58–71.
O’Halloran, K. (2010) How to use corpus linguistics in the study of media discourse. In A. O’Keeffe, and M. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics. London/New York: Routledge, p. 563–577.
O'Halloran, K., and C. Coffin (2004) Checking overinterpretation and underinterpretation: Help from corpora in critical linguistics. In C. Coffin, A. Hewings, and K. O'Halloran (eds.), Applying English Grammar: Functional and Corpus Approaches. London: Arnold, p. 275–297.
O’Keeffe, A., B. Clancy, and S. Adolphs (2011) Introducing Pragmatics in Use. London: Routledge. BoP.
Oloo, A.G.R. (2007) The contemporary opposition in Kenya: Between internal traits and state manipulation. In G.R. Murunga, and S.W. Nasong’o (eds.), Kenya: The Struggle for Democracy. Dakar: CODESRIA Books, p. 90–125.
Rambaud, B. (2008) Caught between information and condemnation: The Kenyan media in the electoral campaigns of December 2007. In J. Lafargue (ed.), The General Elections in Kenya, 2007 (Special issue of Les Cahiers d’Afrique de l’Est (38)). Nairobi: IFRA, p. 57–107.
Reah, D. (1998) The Language of Newspapers. London/New York: Routledge. BoP
Richardson, J.E. (2007) Analysing Newspapers: An Approach from Critical Discourse Analysis. Basingstoke: Palgrave Macmillan.
Rühlemann, C. (2010) What can a corpus tell us about pragmatics? In A. O’Keeffe, and M. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics. London/New York: Routledge, p. 288–301.
Schönfelder, W. (2011) CAQDAS and qualitative syllogism logic—NVivo 8 and MAXQDA 10 Compared [91 paragraphs]. Forum Qualitative Sozialforschung/Forum: Qualitative Social Research 12(1), art. 21.
Sebastiani, F. (2002) Machine learning in automated text categorization. ACM Computing Surveys 34.1: 1–47.
Stamatatos, E., N. Fakotakis, and G. Kokkinakis (2000) Automatic text categorization in terms of genre and author. Computational Linguistics 26.4: 471–495.
Stubbs, M. (1996) Text and Corpus Analysis: Computer-assisted Studies of Language and Culture. Oxford: Blackwell. BoP
. (2001) Texts, corpora, and problems of interpretation: A response to Widdowson. Applied Linguistics 22.2: 149–172.
Thornbury, S. (2010) What can a corpus tell us about discourse? In A. O’Keeffe, and M. McCarthy (eds.), The Routledge Handbook of Corpus Linguistics. London/New York: Routledge, p. 270–287.
Van Ginneken, J. (2002) De schepping van de wereld in het nieuws: De 101 vertekeningen die elk 1 procent verschil maken (2nd ed.). Kluwer: Alphen aan den Rijn.
Van Leeuwen, T. (2008) Discourse and Practice: New Tools for Critical Discourse Analysis. Oxford: Oxford University Press. BoP
Verschueren, J. (1996) Contrastive ideology research: Aspects of a pragmatic methodology. Language Sciences 18.3/4: 589–603. BoP
. (1999) Understanding Pragmatics. London: Arnold. BoP
Westerståhl, J., and F. Johansson (1994) Foreign news: News values and ideologies. European Journal of Communication 91: 71–89.
Witten, I.H., and E. Frank (2005) Data Mining Practical Machine Learning Tools and Techniques (2nd ed.). San Francisco: Elsevier.
Cited by (21)
Cited by 21 other publications
Cho, Eunbyul
Rambaccussing, Dooruj & Andrzej Kwiatkowski
Castro Cáceres, Mariela Lucina, David Alejandro Chávez Salazar & Rubén Urbizagástegui Alvarado
Cheema, Gullal S., Sherzod Hakimov, Eric Müller-Budack, Christian Otto, John A. Bateman & Ralph Ewerth
Jura, Jarosław & Kaja Kałużyńska
Lee, Kyunghyun, Weonsun Choi, Beonghwa Jeon & Gwangyong Gim
Yoon, Tae-Il
Kim, Duk Jin, Woo Yeong Lee & Do Hyung Kim
Škrlj, Blaž, Matej Martinc, Nada Lavrač & Senja Pollak
Celardo, Livia, Rita Vallerotonda, Daniele De Santis, Claudio Scarici & Antonio Leva
Furkó, Péter B.
Sokolova, Marina
Velasquez, Pocholo Andrew E. & Cristina J. Montiel
Jongwoo Kim & 주수산나
Berendt, Bettina
Berendt, Bettina
Lee, Eunmi, Hyun Suk Cho & Seo Hwa Jeong
Montiel, Cristina Jayme, Audris Umel & Marlene de Leon
Cheney, Debora
Jeong, Seo Hwa & Hyun Suk Cho
This list is based on CrossRef data as of 30 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
