ProtAnt: A tool for analysing the prototypicality of texts

Anthony, Laurence; Baker, Paul

doi:10.1075/ijcl.20.3.01ant

Article published In: International Journal of Corpus Linguistics
Vol. 20:3 (2015) ► pp.273–292

Get fulltext from our e-platform

Download PDF

ProtAnt

A tool for analysing the prototypicality of texts

Laurence Anthony | Waseda University

Paul Baker | Lancaster University

Available under the Creative Commons Attribution (CC BY) 4.0 license.

For any use beyond this license, please contact the publisher at rights@benjamins.nl.

Published online: 28 August 2015

https://doi.org/10.1075/ijcl.20.3.01ant

Corpus-based researchers and traditional qualitative researchers, such as those interested in critical discourse analysis, are often required to select prototypical texts for close reading that include the language features of interest that are present in a much larger corpus. Traditional approaches to this selection procedure have been largely ad hoc. In this paper, we offer a more principled way of selecting texts for close reading based on a ranking of texts in terms of the number of keywords they contain. To facilitate this analysis, we have developed a multiplatform, freeware software tool called ProtAnt that analyses the texts, generates a ranked list of keywords based on statistical significance and effect size, and then orders the texts by the number of keywords in them. We describe various experiments that demonstrate the ProtAnt analysis is effective not only at identifying prototypical texts, but also identifying outlier texts that may need to be removed from a target corpus.

Keywords: prototypicality, keywords, ProtAnt, qualitative research, critical discourse analysis

References (31)

Anthony, L. (2014). AntConc (Version 3.4.3) [Computer Software]. Tokyo, Japan: Waseda University. Retrieved from [URL] (last accessed May 2015).

Anthony, L., & Baker, P. (2015). ProtAnt (Version 1.0) [Computer Software]. Tokyo, Japan: Waseda University. Retrieved from [URL] (last accessed May 2015).

Bahrololoum, A., Nezamabadi-pour, H., Bahrololoum, H., & Saeed, M. (2012). A prototype classifier based on gravitational search algorithm. Applied Soft Computing, 12(2), 819–825.

Baker, P. (2009). The BE06 Corpus of British English and recent language change. International Journal of Corpus Linguistics, 14(3), 312–337.

Baker, P., Gabrielatos, C., & McEnery. T. (2013). Discourse Analysis and Media Attitudes: The Representation of Islam in the British Press. Cambridge, UK: Cambridge University Press.

Caldas-Coulthard, C.R., & van Leeuwen, T. (2013). Teddy bear stories. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 35–60). Los Angeles, CA: Sage. (Original work published 2003).

Chen, L., Guo, G., & Wang, K. (2011). Class-dependent projection based method for text categorization. Pattern Recognition Letters, 32(10), 1493–1501.

Chouliaraki, L. (2013). Political discourse in the news: Democratizing responsibility or aestheticizing politics? In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 97–118). Los Angeles, CA: Sage. (Original work published 2000).

Damerau, F.J. (1993). Generating and evaluating domain-oriented multi-word terms from texts. Information Processing and Management, 29(4), 433–447.

Durfee, A., Visa, A., Vanharanta, H., Schneberger, S., & Back, B. (2007). Mining text with the Prototype-matching method. Information Resources Management Journal, 20(3), 19–31.

Ehrlich, S.Z., & Blum-Kulka, S. (2013). Peer talk as a ‘double opportunity space’: The case of argumentative discourse. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 145–168). Los Angeles, CA: Sage. (Original work published 2010).

Fayed, H.A., Hashem, S.R., & Atiya, A.F. (2007). Self-generating prototypes for pattern classification. Pattern Recognition, 40(5), 1498–1509.

Gabrielatos, C., & Baker, P. (2008). Fleeing, sneaking, flooding: A corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press (1996-2005). Journal of English Linguistics, 36(1), 5–38.

Gavriely-Nuri, D. (2013). If both opponents “extend hands in peace”, why don’t they meet? Mythic metaphors and cultural codes in the Israeli peace discourse. In R. Wodak, (Ed.). Critical Discourse Analysis Volume II: Methodologies (pp. 169–186). Los Angeles, CA: Sage. (Original work published 2010).

Gries, S. Th. (2003). Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics, 11, 1–27.

Hardie, A. (2014). CQPWeb (Version 3.1.10) [Computer Software]. Lancaster, UK: Lancaster University. Retrieved from [URL] (last accessed May 2015).

Khosravinik, M. (2010). The representation of refugees, asylum seekers and immigrants in British newspapers: A critical discourse analysis. Journal of Language and Politics, 9(1), 1–28.

Kloptchenko, A., Back, B., Visa, A., Toivonen, J., & Vanharanta, H. (2002). Toward content based retrieval from scientific text corpora. In Proceedings of the 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS), Divnomorskoe, Russia , 5-10 September 2002 (pp. 444–449). Washington, DC, USA: IEEE Computer Society.

Kloptchenko, A., Magnusson, C., Back, B., Visa, A., & Vanharanta, H. (2004). Mining textual contents of financial reports. The International Journal of Digital Accounting Research, 4(7), 1–29.

Labov, W. (1973). The boundaries of words and their meanings. In J. Fishman (Ed.), New Ways of Analyzing Variation in English (pp. 340–73). Washington, DC: Georgetown University Press.

Leńko-Szymańska, A. (2006). The curse and blessing of mobile phones: A corpus-based study into American and Polish rhetorical conventions. In A. Wilson, D. Archer & P. Rayson (Eds.), Corpus Linguistics around the World (pp. 141–151). London, UK: Rodopi.

Machin, D., & Suleman, U. (2013). Arab and American computer war games: The influence of a global technology on discourse. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 229–252). Los Angeles, CA: Sage. (Original work published 2006)

Manning, C.D., Raghavan, P., & Schutze, H. (2008). An Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press.

Potts, A., & Baker. P. (2012). Does semantic tagging identify cultural change in British and American English? International Journal of Corpus Linguistics, 17(3), 295–324.

Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104(3), 192–233.

Sajid, F. (2013). Critical discourse analysis of news headline about Imran Khan’s peace march towards Wazaristan. Journal of Humanities and Social Science, 7(3), 18–24.

Scott, M. (2014). WordSmith Tools (Version 6) [Computer Software]. Liverpool, UK: Lexical Analysis Software. Retrieved from [URL] (last accessed May 2015).

van Leeuwen, T. (1996). The representation of social actors. In C.R. Caldas Coulthard & M. Coulthard (Eds.), Texts and Practices (pp. 32–70). London, UK: Routledge.

Visa, A., Toivonen, J., Vanharanta, H., & Back, B. (2001). Prototype matching: Finding meaning in the books of the bible. In Proceedings of the 34th Annual Hawaii International Conference on System Sciences (HICSS-34), Hawaii, USA, 3-6 January 2001 (pp. 3002). Washington, DC, USA: IEEE Computer Society.

Widdowson, H.G. (2004). Text, Context, Pretext: Critical Issues in Discourse Analysis. Oxford, UK: Blackwell.

Wodak, R. (2013). Critical Discourse Analysis. Los Angeles, CA: Sage.

Cited by (25)

Cited by 25 other publications

Order by:

Candarli, Duygu & Alice Deignan

2025. Rhetorical moves in teachers’ PowerPoint presentations: Variation across disciplines and school stages. Journal of English for Academic Purposes 76 ► pp. 101532 ff.

Chen, Ruina, Zhuojun Zhong, Xinyu Yuan & Haitao Liu

2025. Two sides of the same coin? Cross-linguistic sentiment comparison and thematic discovery of reader’s reception of Wolf Totem . Digital Scholarship in the Humanities 40:1 ► pp. 40 ff.

Bednarek, Monika, Martin Schweinberger & Kelvin K. H. Lee

2024. Corpus-based discourse analysis: from meta-reflection to accountability. Corpus Linguistics and Linguistic Theory 20:3 ► pp. 539 ff.

Hanks, Elizabeth, Brett Hashimoto & Jesse Egbert

2024. The contracts word list: Integral vocabulary for reading and writing English contracts. English for Specific Purposes 75 ► pp. 37 ff.

Ireland, Katherine Ann

2024. Review of Price (2022): The language of mental illness: Corpus linguistics and the construction of mental illness in the press. International Journal of Corpus Linguistics 29:4 ► pp. 617 ff.

Bremner, Flo

2023. Reacting to Black Lives Matter: The discursive construction of racism in UK newspapers. Politics 43:3 ► pp. 298 ff.

Irschara, Karoline

2023. Using a Corpus-Assisted Discourse Studies Approach to Analyse Gender: A Case Study of German Radiology Reports. Gender a výzkum / Gender and Research 23:2 ► pp. 114 ff.

Watanabe, Hideo

2023. The discursive construction of a conflict: a case of disputed islands in the East China Sea. Text & Talk 43:3 ► pp. 333 ff.

Fernández, Julieta

2022. Corpus linguistics in L2 pragmatics research. Applied Pragmatics 4:2 ► pp. 178 ff.

McKeown, Jamie

2022. Book Review. Applied Corpus Linguistics 2:3 ► pp. 100034 ff.

Mockler, Nicole & Elizabeth Redpath

2022. Shoring Up “Teacher Quality”: Media Discourses of Teacher Education in the United Kingdom, United States, and Australia. In The Palgrave Handbook of Teacher Education Research, ► pp. 1 ff.

Mockler, Nicole & Elizabeth Redpath

2023. Shoring Up “Teacher Quality”: Media Discourses of Teacher Education in the United Kingdom, United States, and Australia. In The Palgrave Handbook of Teacher Education Research, ► pp. 933 ff.

Tang, Chris

2022. ‘Amber Alert’ or ‘Heatwave Warning’: The Role of Linguistic Framing in Mediating Understandings of Early Warning Messages about Heatwaves and Cold Spells. Applied Linguistics 43:2 ► pp. 227 ff.

Zhang, Weiyu & Yin Ling Cheung

2022. The Hierarchy of News Values – A Corpus-Based Diachronic and Cross-Cultural Comparison of News Reporting on Epidemics. Journalism Studies 23:3 ► pp. 281 ff.

Hocking, Darryl

2021. Artist’s statements, ‘how to guides’ and the conceptualisation of creative practice. English for Specific Purposes 62 ► pp. 103 ff.

Lienen, Carmen Sarah & J. Christopher Cohrs

2021. Redefining the Meaning of Negative History in Times of Sociopolitical Change: A Social Creativity Approach. Political Psychology 42:6 ► pp. 941 ff.

Pollak, Calvin

2021. Legitimation and Textual Evidence: How the Snowden Leaks Reshaped the ACLU’s Online Writing About NSA Surveillance. Written Communication 38:3 ► pp. 380 ff.

Egbert, Jesse, Tove Larsson & Douglas Biber

2020. Doing Linguistics with a Corpus,

Kania, Ursula

2020. Marriage for all (‘Ehe fuer alle’)?! A corpus-assisted discourse analysis of the marriage equality debate in Germany. Critical Discourse Studies 17:2 ► pp. 138 ff.

Mockler, Nicole

2020. Discourses of teacher quality in the Australian print media 2014–2017: a corpus-assisted analysis. Discourse: Studies in the Cultural Politics of Education 41:6 ► pp. 854 ff.

Mockler, Nicole

2025. Accounting for teachers: changing representations of education in The Australian Financial Review 1993–2022 . Educational Review 77:6 ► pp. 1778 ff.

Wang, Feng (Robin) & Philippe Humblé

2020. Readers’ perceptions of Anthony Yu’s self-retranslation ofThe Journey to the West. Perspectives 28:5 ► pp. 756 ff.

Dong, Jihua

2018. Baker, P., & Egbert, J. (Eds.) (2016).Triangulating Methodological Approaches in Corpus Linguistic Research. International Journal of Corpus Linguistics 23:3 ► pp. 375 ff.

Gianfreda, Stella

2018. Politicization of the refugee crisis?: a content analysis of parliamentary debates in Italy, the UK, and the EU. Italian Political Science Review/Rivista Italiana di Scienza Politica 48:1 ► pp. 85 ff.

Turner, Georgina, Sara Mills, Isabelle van der Bom, Laura Coffey-Glover, Laura L Paterson & Lucy Jones

2018. Opposition as victimhood in newspaper debates about same-sex marriage. Discourse & Society 29:2 ► pp. 180 ff.

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.