Article published In: International Journal of Corpus Linguistics
Vol. 20:1 (2015) ► pp.81–101
Automatic analysis of thematic structure in written English
Published online: 30 March 2015
https://doi.org/10.1075/ijcl.20.1.04par
https://doi.org/10.1075/ijcl.20.1.04par
This paper proposes and describes a computational system for the automatic analysis of thematic structure, as defined in Systemic Functional Linguistics, in written English. The system takes an English text as input and produces as output an analysis of the thematic structure of each sentence in the text. The system is evaluated using data from The Wall Street Journal section of the Penn Treebank (Marcus et al. 1993) and the British Academic Written English corpus (Gardner & Nesi 2013). An experiment using these data shows that the system achieves a high degree of reliability in regard to both identifying theme-rheme boundaries and determining several of the linguistic properties of the identified themes, including syntactic nodes, theme function, markedness, mood types, and theme roles. To illustrate how the system is used, we describe an example application designed to compare collections of novice and expert academic writing in terms of thematic structure.
References (26)
Eggins, S. (2004). An Introduction to Systemic Functional Linguistics (2nd ed.). New York, NY: Continuum.
Gardner, S., & Nesi, H. (2013). A classification of genre families in university student writing. Applied Linguistics, 34(1), 25–52.
Ghadessy, M. (1999). Thematic organization in academic article abstract. Estudios Ingleses de la Universidad Complutense, 71, 141–161.
Gosden, H. (1995). Success in research article writing and revision: A social-constructionist perspective. English for Specific Purposes, 14(1), 37–57.
Halliday, M.A.K. (1994). An Introduction to Functional Grammar (2nd ed.). London, UK: Edward Arnold.
Halliday, M.A.K., & Matthiessen, C. (2004). An Introduction to Functional Grammar (3rd ed.). London, UK: Edward Arnold.
Hunt, K.W. (1965). Grammatical Structures Written at Three Grade Levels (NCTE research report no. 3). Urbana, IL: National Council of Teachers of English.
. (1970). Do sentences in the second language grow like those in the first? TESOL Quarterly, 4(3), 195–202.
Jalilifar, A. (2009). Thematic development in English and translated academic texts. Journal of Language and Translation, 10(1), 81–111.
Kappagoda, A. (2009). The Use of Systemic-functional Linguistics in Automated Text Mining. Edinburgh, Australia: Defense Science and Technology Organization.
Klein, D., & Manning, C.D. (2003). Fast exact inference with a factored model for natural language parsing. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in Neural Information Processing Systems 15 (pp. 3–10). Cambridge, MA: MIT Press.
Lu, X. (2002). Discourse and ideology: The Taiwan issue in the Chinese and American media. In C.N. Candlin (Ed.), Research and Practice in Professional Discourse (pp. 589–608). Hong Kong: City University of Hong Kong Press.
. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496.
Marcus, M.P., Marcinkiewicz, M.A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Martínez, I.A. (2003). Aspects of theme in the method and discussion sections of biology journal articles in English. Journal of English for Academic Purposes, 2(2), 103–123.
McCabe, A.M. (1999). Theme and thematic patterns in Spanish and English history texts. (Unpublished doctoral dissertation). Aston University, Birmingham, UK.
North, S.P. (2005). Disciplinary variation in the use of theme in undergraduate essays. Applied Linguistics, 26(3), 431–452.
O’Halloran, K.L. (2003). Systemics 1.0: Software for research and teaching systemic functional linguistics. RELC Journal, 34(2), 155–177.
Schleppegrell, M. (2001). Linguistic features of the language of schooling. Linguistics and Education, 12(4), 431–459.
Schwarz, L., Bartsch, S., Eckart, R., & Teich, E. (2008). Exploring automatic theme identification: A rule-based approach. In A. Storrer, A. Geyken, A. Siebert & K.-M. Würzner (Eds.), Text Resources and Lexical Knowledge. Selected Papers from the 9th Conference on Natural Language Processing (pp. 15–26). Berlin, Germany: Mouton de Gruyter.
Souter, D.C. (1996). A corpus-trained parser for systemic-functional syntax. (Unpublished doctoral dissertation). University of Leeds, Leeds, UK.
Steinberger, R., & Bennett, P. (1994). Automatic recognition of theme, focus and contrastive stress. In P. Bosch & R. van der Sandt (Eds.), Proceedings of the interdisciplinary conference in celebration of the 10th anniversary of the journal of semantics, 12–15 August 1994 (Vol. 11, pp. 205–214). Meinhard-Schwebda, Germany: The IBM Institute for Logic and Linguistics.
Cited by (3)
Cited by three other publications
Eguchi, Masaki & Kristopher Kyle
Dontcheva-Navratilova, Olga, Renata Jančaříková, Irena Hůlková & Josef Schmied
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
