Article published In: International Journal of Corpus Linguistics
Vol. 20:3 (2015) ► pp.273–292
ProtAnt
A tool for analysing the prototypicality of texts
Available under the Creative Commons Attribution (CC BY) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 28 August 2015
https://doi.org/10.1075/ijcl.20.3.01ant
https://doi.org/10.1075/ijcl.20.3.01ant
Corpus-based researchers and traditional qualitative researchers, such as those interested in critical discourse analysis, are often required to select prototypical texts for close reading that include the language features of interest that are present in a much larger corpus. Traditional approaches to this selection procedure have been largely ad hoc. In this paper, we offer a more principled way of selecting texts for close reading based on a ranking of texts in terms of the number of keywords they contain. To facilitate this analysis, we have developed a multiplatform, freeware software tool called ProtAnt that analyses the texts, generates a ranked list of keywords based on statistical significance and effect size, and then orders the texts by the number of keywords in them. We describe various experiments that demonstrate the ProtAnt analysis is effective not only at identifying prototypical texts, but also identifying outlier texts that may need to be removed from a target corpus.
References (31)
Anthony, L. (2014). AntConc (Version 3.4.3) [Computer Software]. Tokyo, Japan: Waseda University. Retrieved from [URL] (last accessed May 2015).
Anthony, L., & Baker, P. (2015). ProtAnt (Version 1.0) [Computer Software]. Tokyo, Japan: Waseda University. Retrieved from [URL] (last accessed May 2015).
Bahrololoum, A., Nezamabadi-pour, H., Bahrololoum, H., & Saeed, M. (2012). A prototype classifier based on gravitational search algorithm. Applied Soft Computing, 12(2), 819–825.
Baker, P. (2009). The BE06 Corpus of British English and recent language change. International Journal of Corpus Linguistics, 14(3), 312–337.
Baker, P., Gabrielatos, C., & McEnery. T. (2013). Discourse Analysis and Media Attitudes: The Representation of Islam in the British Press. Cambridge, UK: Cambridge University Press.
Caldas-Coulthard, C.R., & van Leeuwen, T. (2013). Teddy bear stories. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 35–60). Los Angeles, CA: Sage. (Original work published 2003).
Chen, L., Guo, G., & Wang, K. (2011). Class-dependent projection based method for text categorization. Pattern Recognition Letters, 32(10), 1493–1501.
Chouliaraki, L. (2013). Political discourse in the news: Democratizing responsibility or aestheticizing politics? In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 97–118). Los Angeles, CA: Sage. (Original work published 2000).
Damerau, F.J. (1993). Generating and evaluating domain-oriented multi-word terms from texts. Information Processing and Management, 29(4), 433–447.
Durfee, A., Visa, A., Vanharanta, H., Schneberger, S., & Back, B. (2007). Mining text with the Prototype-matching method. Information Resources Management Journal, 20(3), 19–31.
Ehrlich, S.Z., & Blum-Kulka, S. (2013). Peer talk as a ‘double opportunity space’: The case of argumentative discourse. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 145–168). Los Angeles, CA: Sage. (Original work published 2010).
Fayed, H.A., Hashem, S.R., & Atiya, A.F. (2007). Self-generating prototypes for pattern classification. Pattern Recognition, 40(5), 1498–1509.
Gabrielatos, C., & Baker, P. (2008). Fleeing, sneaking, flooding: A corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press (1996-2005). Journal of English Linguistics, 36(1), 5–38.
Gavriely-Nuri, D. (2013). If both opponents “extend hands in peace”, why don’t they meet? Mythic metaphors and cultural codes in the Israeli peace discourse. In R. Wodak, (Ed.). Critical Discourse Analysis Volume II: Methodologies (pp. 169–186). Los Angeles, CA: Sage. (Original work published 2010).
Gries, S. Th. (2003). Towards a corpus-based identification of prototypical instances of constructions. Annual Review of Cognitive Linguistics, 11, 1–27.
Hardie, A. (2014). CQPWeb (Version 3.1.10) [Computer Software]. Lancaster, UK: Lancaster University. Retrieved from [URL] (last accessed May 2015).
Khosravinik, M. (2010). The representation of refugees, asylum seekers and immigrants in British newspapers: A critical discourse analysis. Journal of Language and Politics, 9(1), 1–28.
Kloptchenko, A., Back, B., Visa, A., Toivonen, J., & Vanharanta, H. (2002). Toward content based retrieval from scientific text corpora. In
Proceedings of the 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS), Divnomorskoe, Russia
, 5-10 September 2002 (pp. 444–449). Washington, DC, USA: IEEE Computer Society.
Kloptchenko, A., Magnusson, C., Back, B., Visa, A., & Vanharanta, H. (2004). Mining textual contents of financial reports. The International Journal of Digital Accounting Research, 4(7), 1–29.
Labov, W. (1973). The boundaries of words and their meanings. In J. Fishman (Ed.), New Ways of Analyzing Variation in English (pp. 340–73). Washington, DC: Georgetown University Press.
Leńko-Szymańska, A. (2006). The curse and blessing of mobile phones: A corpus-based study into American and Polish rhetorical conventions. In A. Wilson, D. Archer & P. Rayson (Eds.), Corpus Linguistics around the World (pp. 141–151). London, UK: Rodopi.
Machin, D., & Suleman, U. (2013). Arab and American computer war games: The influence of a global technology on discourse. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 229–252). Los Angeles, CA: Sage. (Original work published 2006)
Manning, C.D., Raghavan, P., & Schutze, H. (2008). An Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press.
Potts, A., & Baker. P. (2012). Does semantic tagging identify cultural change in British and American English? International Journal of Corpus Linguistics, 17(3), 295–324.
Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104(3), 192–233.
Sajid, F. (2013). Critical discourse analysis of news headline about Imran Khan’s peace march towards Wazaristan. Journal of Humanities and Social Science, 7(3), 18–24.
Scott, M. (2014). WordSmith Tools (Version 6) [Computer Software]. Liverpool, UK: Lexical Analysis Software. Retrieved from [URL] (last accessed May 2015).
van Leeuwen, T. (1996). The representation of social actors. In C.R. Caldas Coulthard & M. Coulthard (Eds.), Texts and Practices (pp. 32–70). London, UK: Routledge.
Visa, A., Toivonen, J., Vanharanta, H., & Back, B. (2001). Prototype matching: Finding meaning in the books of the bible. In
Proceedings of the 34th Annual Hawaii International Conference on System Sciences (HICSS-34), Hawaii, USA, 3-6 January 2001 (pp. 3002). Washington, DC, USA: IEEE Computer Society.
Cited by (25)
Cited by 25 other publications
Candarli, Duygu & Alice Deignan
Chen, Ruina, Zhuojun Zhong, Xinyu Yuan & Haitao Liu
Bednarek, Monika, Martin Schweinberger & Kelvin K. H. Lee
Hanks, Elizabeth, Brett Hashimoto & Jesse Egbert
Ireland, Katherine Ann
2024. Review of Price (2022): The language of mental illness: Corpus linguistics and the construction of mental illness in the press. International Journal of Corpus Linguistics 29:4 ► pp. 617 ff.
Bremner, Flo
Irschara, Karoline
Watanabe, Hideo
Fernández, Julieta
Mockler, Nicole & Elizabeth Redpath
Mockler, Nicole & Elizabeth Redpath
Tang, Chris
Zhang, Weiyu & Yin Ling Cheung
Hocking, Darryl
Lienen, Carmen Sarah & J. Christopher Cohrs
Pollak, Calvin
Kania, Ursula
Mockler, Nicole
Mockler, Nicole
Wang, Feng (Robin) & Philippe Humblé
Dong, Jihua
2018. Baker, P., & Egbert, J. (Eds.) (2016).Triangulating Methodological Approaches in Corpus Linguistic Research. International Journal of Corpus Linguistics 23:3 ► pp. 375 ff.
Gianfreda, Stella
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
