How Comparable Can 'Comparable Corpora' Be?
The development of a coherent methodology for corpus-based work in translation studies is essential for the evolution of this newfield of research into a fully-fledged paradigm within the discipline. The design of a monolingual, multi-source-language comparable corpus of English as a resource for the systematic study of the nature of translated text can be regarded as an important step towards the development of such a methodology. This paper deals with a crucial and problematic aspect of the design of a monolingual comparable corpus, namely the achievement of an adequate level of comparability between its translational and non-translational components.
Table of contents
- Abstract
- 1.Introduction
- 2.The Design of the English Comparable Corpus (ECC)
- 2.1.Initial Definition of Comparable Corpus
- 2.2Corpus Typology and Classification of the ECC
- 2.3.The Theoretical and Practical Motivation for the TEC Design
- 2.4.The Identification of the Text Categories of TEC
- 2.5.Selecting Suitable Texts for TEC
- 2.6.Design of NON-TEC
- 3.Evaluation of Comparability in the Design of ECC
- 4.Use of the English Comparable Corpus
- Notes
- References
- Résumé
- Address for correspondence
When I first read the manuscript of Mona Baker's article "Corpora in Translation Studies: An Overview and Some Suggestions for Future Research" (1995), I was inspired by the challenge of working towards developing a coherent methodology for corpus-based translation studies, because I believed then (and still do) that this is an essential step for realising the potential envisaged in this new field of research. In October 1994, I began working on the creation of a monolingual comparable corpus of English, which was [ p. 290 ]conceived as a resource to be made available to the academic community for the systematic study of the linguistic nature of translated text. The present size of the corpus is 2,000,000 words and it is now in the process of being made accessible to translations scholars through the network.