In:Mathematical Modelling in Linguistics and Text Analysis: Theory and applications
Edited by Adam Pawłowski, Sheila Embleton, Jan Mačutek and Aris Xanthos
[Current Issues in Linguistic Theory 370] 2025
► pp. 149–160
Probabilistic regularity in translation
A quantitative description of dependency treebank of translated academic abstracts
Published online: 13 October 2025
https://doi.org/10.1075/cilt.370.13lia
https://doi.org/10.1075/cilt.370.13lia
Abstract
Although there are many studies on translational language, they present contradictory results. This study
investigates the syntactic and typological properties of translational language by adopting two main indices of dependency
grammar with an aim to reveal one probabilistic translation regularity: the law of interference. After that, machine learning
methods were adopted to classify translated and non-translated texts, and to further verify translation regularity on larger
datasets by checking classification performance. The results show that (1) the MDD of translated English abstracts is
significantly longer than that of non-translated English abstracts; (2) there are more head-final and fewer head-initial
dependencies in translated than in non-translated English abstracts; and (3) compared with K-nearest Neighbor
and Support Vector Machine, Random Forest achieves satisfying classification performance. These findings show the interference
effect in translation and suggest that the framework of dependency grammar is helpful to reveal translation regularity.
Article outline
- 1.Introduction
- 2.Materials and method
- 3.Results and discussion
- 3.1Syntactic differences in dependency distance
- 3.2Typological differences in dependency direction
- 3.3The classification performance of machine learning methods
- 4.Conclusion
References
References (25)
Baker, Mona. 1993. Corpus
linguistics and translation studies: Implications and
applications. In Mona Baker, Gill Francis & Elena Tognini-Bonelli (eds.), Text
and technology: In honour of John
Sinclair, 233–250. Amsterdam: Benjamins.
Dryer, Matthew S. 1997. On the six-way word
order typology. Studies in
Language 21(1). 69–103.
Geeraerts, Dirk & Hubert Cuyckens. 2010. The
Oxford handbook of cognitive linguistics. New York: Oxford University Press.
Jantunen, Jarmo Harri. 2001. Synonymity and
lexical simplification in translation: A corpus-based approach. Across Languages and
Cultures 2(1). 97–112.
Jiang, Jingyang & Haitao Liu. 2015. The
effects of sentence length on dependency distance, dependency direction and the implications-based on a parallel
English-Chinese dependency treebank. Language
Sciences 50. 93–104.
Laviosa-Braithwaite, Sara. 1996. The
English comparable corpus (ECC): A resource and a methodology for the empirical study of
translation. Ph.D. dissertation, University of Manchester, Institute of Science and Technology.
Liu, Haitao. 2008. Dependency
distance as a metric of language comprehension difficulty. Journal of Cognitive
Science 9(2). 159–191.
Liu, Haitao, Richard Hudson & Zhiwei Feng. 2009a. Using
a Chinese treebank to measure dependency distance. Corpus Linguistics and Linguistic
Theory 5(2). 161–174.
Liu, Haitao. 2010. Dependency
direction as a means of word-order typology: A method based on dependency
treebanks. Lingua 120. 1567–1578.
Manning, Christopher D. et al. 2014. The
Stanford CoreNLP natural language processing
toolkit. In Kalina Bontcheva & Jingbo Zhu (eds.), Proceedings
of 52nd annual meeting of the Association for Computational Linguistics: System
demonstrations, 55–60, Baltimore, Maryland. Asscociation for Computational Linguisitcs.
Mauranen, Anna. 1999. Will
‘translationese’ ruin a contrastive study? Languages in
Contrast 2(2). 161–185.
. 2004. Corpora,
universals and interference. In Anna Mauranen & Pekka Kujamäki (eds.), Translation
universals: Do they
exist?, 65–82. Amsterdam: Benjamins.
Mutesayire, Martha. 2004. Apposition
markers and explicitation: A corpus-based study. Language Matters: Studies in the
Languages of
Africa 35(1). 54–69.
Puurtinen, Tiina. 2003. Genre-specific
features of translationese? Linguistic differences between translated and non-translated Finnish children’s
literature. Literary and Linguistic
Computing 18(4). 389–406.
Temperley, David. 2007. Minimization
of dependency length in written
English. Cognition 105(2). 300–333.
Toury, Gideon. 2012. Descriptive
translation studies and beyond (revised
edition). Amsterdam: Benjamins.
Trosborg, Anna. 1997. Translating
hybrid political texts. In Anna Trosborg (ed.), Text
Typology and
Translation, 145–159. Amsterdam: Benjamins.
