In:The Functional Perspective on Language and Discourse: Applications and implications
Edited by María de los Ángeles Gómez González, Francisco José Ruiz de Mendoza Ibáñez, Francisco Gonzálvez-García and Angela Downing
[Pragmatics & Beyond New Series 247] 2014
► pp. 57–86
Contrastive corpus annotation in the CONTRANOT project
Issues and problems
Published online: 16 May 2014
https://doi.org/10.1075/pbns.247.04lav
https://doi.org/10.1075/pbns.247.04lav
In this paper we outline a number of issues and problems which arise during
the process of contrastive human-coded corpus annotation of certain semantic
and discourse categories within the framework of the CONTRANOT project,
aimed at the creation and validation of contrastive functional descriptions
through corpus analysis and annotation. Human-coded corpus annotation is a
preliminary step for the training of computer algorithms which allow the automation
of the annotation of large corpora, but it can also serve as a mechanism
for testing aspects of linguistic theories empirically, such as theory formation
and theory-redefinition, as well as enriching theories with quantitative information.
The work reported in this paper focuses on the annotation of the category
of Thematisation, on the one hand, and on Modality, on the other, to illustrate
the challenges researchers have to face when confronted with the task of developing
well-designed and reliable annotation procedures for complex linguistic
phenomena in a contrastive manner. We describe the annotation tasks and
procedures developed so far, which include the design of annotation schemas
on the basis of available linguistic theories and the testing of their reliability
through agreement studies. We also evaluate and discuss the results of the annotations
on the basis of their relevance for the theoretical characterisation of the
investigated phenomena. We expect that our work will have an impact in the
area of contrastive textual analysis, and that it will pave the way for the development
of automated annotation systems for computational applications.
References (18)
Arús, Jorge, Julia Lavid, and Lara Moratón. 2012. “Annotating Thematic Features in English and Spanish: A Contrastive Corpus-based Study.
” Linguistics and the Human Sciences
6: 173–192.
Carretero, Marta, and Juan Rafael Zamorano-Mansilla. 2010. “Annotating English and Spanish corpora for the categories of epistemic and deontic modality.” Paper presented at the 4th International Conference on Modality in English. Madrid, Universidad Complutense, 9–11 September.
Carretero, Marta, and Maite Taboada. In press. “The Annotation of Appraisal: How Attitude and Epistemic Modality Overlap in English and Spanish Consumer Reviews.” In
Thinking Modally: English and Contrastive Studies on Modality
, ed. by Juan Rafael Zamorano-Mansilla, E. Domínguez-Romero, C. Maíz-Arévalo, and M. V. Martín de la Rosa. Bern: Peter Lang.
Hermerén, Lars. 1978.
On Modality in English: the Study of the Semantics of the Modals
. Lund: Gleerup.
Hovy, Eduard, and Julia Lavid. 2010. “Towards a Science of Corpus Annotation: A New Methodological Challenges for Corpus Linguistics.
” International Journal of Translation
22 (1): 13–36.
Lavid, Julia. 2012. “Corpus Analysis and Annotation in CONTRANOT: Linguistic and Methodological Challenges.” In
Encoding the Past, Decoding the Future: Corpora in the 21st Century
, ed. by Isabel Moskowich, and Begoña Crespo, 205–220. Cambridge: Cambridge Scholars.
Lavid, Julia, Jorge Arús, and Juan Rafael Zamorano-Mansilla. 2010.
Systemic Functional Grammar of Spanish: A Contrastive Study with English
. London: Continuum.
Lavid, Julia, Jorge Arús, and Lara Moratón. 2010a. “Towards an Annotated English–Spanish Corpus with SFL–based Textual Features.” Paper presented at the 37th International Systemic–Functional Congress. Vancouver, Canada.
. 2010b. “Investigating Thematic Meaning in English and Spanish: A Methodological Proposal.” Paper presented at the 22nd European Systemic–Functional Linguistics Conference and Workshop. University of Primorska (Koper, Eslovenia). To be published in G. O’Grady, et al. (eds.). Choice in Language: Applications in Text Analysis
. London: Equinox.
McEnery, Anthony, R. Xiao, and Y. Tono. 2006.
Corpus-based Language Studies: An Advanced Resource Book
. New York: Routledge.
Nuyts, Jan. 2001.
Epistemic Modality, Language and Conceptualisation: A Cognitive–Pragmatic Perspective
. Amsterdam: John Benjamins.
Reidsma, Dennis, and Jean Carletta. 2008. “Reliability Measurement without Limits.
” Computational Linguistics
34 (3): 319–326.
Silva–Corvalán, Carmen. 1995. “Contextual Conditions for the Interpretation of ‘Poder’ and ‘Deber’ in Spanish.” In
Modality in Grammar and Discourse
, ed. by Joan Bybee, and Suzanne Fleischman, 67–105. Amsterdam: John Benjamins.
Cited by (2)
Cited by two other publications
López, Julia Lavid
2019. Discourse annotation in the MULTINOT corpus. In Parallel Corpora for Contrastive and Translation Studies [Studies in Corpus Linguistics, 90], ► pp. 159 ff.
This list is based on CrossRef data as of 28 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
