In:Parallel Corpora for Contrastive and Translation Studies: New resources and applications
Edited by Irene Doval and M. Teresa Sánchez Nieto
[Studies in Corpus Linguistics 90] 2019
► pp. 159–182
Discourse annotation in the MULTINOT corpus
Issues and challenges
Published online: 20 March 2019
https://doi.org/10.1075/scl.90.10lop
https://doi.org/10.1075/scl.90.10lop
This chapter summarises and discusses recent work on the development of a bilingual (English-Spanish) corpus consisting of original comparable and parallel texts from a variety of genres and annotated with complex linguistic features such as modality and evidentiality, metadiscourse markers, and thematization, as carried out within the framework of the MULTINOT project. The annotation of these complex features in bilingual parallel texts poses important challenges for the researcher at the different stages of the corpus development, from the preprocessing phases to the manual annotation phase, but, at the same time, it allows the investigation of complex linguistic research questions which could not be addressed on the basis of raw corpora or even with the help of an automatic part-of-speech tagging system.
Keywords: discourse, corpus, annotation, English, Spanish
Article outline
- 1.Introduction
- 2.The MULTINOT corpus
- 3.Annotation procedure
- 3.1Selecting the “training” corpus
- 3.2Instantiating the theory
- 3.3Designing annotation schemes and guidelines
- 3.4Performing annotation experiments
- 3.5Evaluating the annotations
- 3.6Large-scale annotation of the whole corpus
- 4.Annotating thematization in English and Spanish
- 5.Annotating modality in English and Spanish
- 6.Annotating metadiscourse markers in English and Spanish
- 7.Summary and concluding remarks
Acknowledgement Notes References
References (28)
Arús, Jorge, Lavid, Julia & Moratón, Lara. 2012. Annotating thematic features in English and Spanish: A contrastive corpus-based study. Linguistics and the Human Sciences 6: 173–192.
Baker, Kathryn, Bloodgood, Michael, Dorr, Bonnie J., Callison-Burch, Chris, Filardo, Nathaniel W., Piatko, Christine, Lori & Miller, Scott. 2012. Use of modality and negation in semantically informed syntactic MT. Computational Linguistics 38: 1–28.
Boye, Kasper. 2012. Epistemic Meaning: A Crosslinguistic and Functional-cognitive study [Empirical Approaches to Language Typology 43]. Berlin: De Gruyter Mouton.
Correia, Rui, Mamede, Nuno, Baptista, Jorge & Eskenazi, Maxine. 2016. MetaTED: A corpus of metadiscourse for spoken language. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Sara Goggi, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asunción Moreno, Jan Odijk, Stelios Piperidis (eds), 3907–3913. <[URL]> (20 July 2017).
Cohen, Jacob. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1): 37–46.
Hendrickx, Iris, Mendes, Amália & Mencarelli, Silvia. 2012. Modality in text: A proposal for corpus annotation. In Proceedings of the Eighth International Conference on Language Resources and Evaluation – LREC 2012, Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk & Stelios Piperidis (eds), 1805–1812. Istanbul: European Language Resources Association.
Hovy, Eduard & Lavid, Julia. 2010. Towards a ‘science’ of corpus annotation: A new methodological challenge for corpus linguistics. International Journal of Translation 22(1):13–36.
Hyland, Ken & Tse, Polly. 2004. Metadiscourse in academic writing: A reappraisal. Applied Linguistics 25(2): 156–177.
Krippendorff, Klaus. 2004. Reliability in Content Analysis: Some common Misconceptions and Recommendations. Human Communication Research 30(3): 411–433. < [URL]> (12 Nomvember 2018).
Lavid, Julia, Arús, Jorge & Zamorano, Juan R. 2010. Systemic-Functional Grammar of Spanish: a Contrastive Account with English. London: Continuum.
Lavid, Julia & Moratón, Lara. 2015. Intersubjective positioning and thematisation in English and Spanish: A contrastive analysis of letters to the editor. Nordic Journal of English Studies 14 (1): 289–319.
. 2016. Generic structures, rhetorical relations and thematic patterns in English and Spanish journalistic texts: A comparative study. (Paper presented at the 26th ESFLW).
. 2018. Contrastive annotation of interactional discourse markers in English and Spanish newspaper texts. In The Construction of Discourse as Verbal Interaction [Pragmatics & Beyond New Series 296], Maria Ángeles Gómez González & J. Lachlan McKenzie (eds) 75–108. Amsterdam: John Benjamins.
Lavid, Julia, Carretero, Marta, Arús Hita, Jorge, Moratón, Lara & Zamorano-Mansilla, Juan Rafael. 2014. Contrastive corpus annotation in the CONTRANOT Project: issues and problems. In The Functional Perspective on Language and Discourse. Applications and Implications [Pragmatics & Beyond New Series 296], Maria Ángeles Gómez González, Francisco José Ruiz de Mendoza Ibáñez, Francisco Gonzálvez-García & Angela Downing (eds), 57–86. Amsterdam: John Benjamins.
Lavid, Julia & Moratón, Lara. 2016. Annotating metadiscourse markers in the English-Spanish MULTINOT corpus: Preliminary Steps. In Conference Handbook of TextLink – Structuring Discourse in Multilingual Europe Second Action Conference, Liesbeth Degand, Csilla Dér, Péter Furkó, Bonnie Webber (eds), 79–81. Debrecen: Debrecen University Press.
Lavid, Julia, Arús, Jorge & Moratón, Lara. 2013. Investigating thematic choices in two newspaper genres: An SFL-based analysis. In Choice in Language: Applications in Text Analysis, Gerard O' Grady & Lise Fontaine (eds), 187–214. London: Equinox.
Lavid, Julia, Carretero, Marta & Zamorano, Juan R. 2016a. Contrastive annotation of epistemicity in the multinot project: preliminary steps. In Proceedings of the ISA-12, Twelfth Joint ACL – ISO Workshop on Interoperable Semantic Annotation, held in conjunction with Language Resources and Evaluation Conference 2016, Harry Bunt (ed.), 81–88. <[URL]> (20 July 2017).
Lavid, Julia, Carretero, Marta & Zamorano Juan R. 2016b. A linguistically-motivated annotation model of modality in English and Spanish: Insights from MULTINOT. Linguistic Issues in Language Technology 14(4): 1–35. Standford CA: CSLI. <[URL]> (20 July 2017).
McShane, Marjorie, Nirenburg, Sergei & Zacharski, Ron. 2004. Mood and modality: out of theory and into the fray. Natural Language Engineering 10(1): 57–89.
Mora, Natalia. 2017. Annotating Appraisal in English and Spanish Product Reviews from Mobile Application Stores: A Contrastive Study for Linguistic and Computational Purposes. PhD dissertation, Universidad Complutense de Madrid.
Nissim, Malvina, Pietrandrea, Paola, Sansò, Andrea & Mauri, Caterina. 2013. Cross-linguistic annotation of modality: A data-driven hierarchical model. In Proceedings of the 9th Joint ISO – ACL SIGSEM Workshop on Interoperable Semantic Annotation, (isa-9) Harry Bunt (ed.), 7–14. Potsdam. <[URL]> (20 July 2017).
Saurí, Roser & Pustejovsky, James. 2009. Factbank: A corpus annotated with event factuality. Language Resources and Evaluation 43(3): 227–268.
Szarvas, György, Vincze, Veronika, Farkas, Richárd & Csirik, János. 2008. The BioScope corpus: Annotation for negation, uncertainty and their scope in biomedical texts. BioNLP 2008: Current Trends in Biomedical Natural Language Processing, 38–45, Columbus OH: Association for Computational Linguistics. <[URL]> (20 July 2017).
Trnavac, Radoslava, Das, Debopam & Taboada, Maite. 2016. Discourse relations and evaluation. Corpora 11(2): 169–190.
Taboada, Maite. 2016. Sentiment analysis: An overview from linguistics. Annual Review of Linguistics 2: 325–347.
Van de Kauter, Marjan, Coorman, Geert, Lefever, Els, Desmet, Bart, Macken, Lieve & Hoste, Veronique. 2013. LeTs Preprocess: The multilingual LT3 linguistic preprocessing toolkit. Computational Linguistics in the Netherlands Journal 3: 103–120.
