Article published In: International Journal of Corpus Linguistics: Online-First Articles
Is human translation more conservative than machine translation?
A corpus-based study measuring formality across translation varieties and registers
Published online: 14 November 2025
https://doi.org/10.1075/ijcl.24048.li
https://doi.org/10.1075/ijcl.24048.li
Abstract
The present study investigates whether conservatism exists in human- and machine-translated texts from Chinese
into English, and whether this tendency is consistently observable across different registers and multiple lexico-grammatical
features by applying profile-based correspondence analysis and mixed-effects logistic regression modelling. The results reveal
that human translation is characterised by a higher level of conservatism than both machine translation and original writing,
irrespective of registers and lexico-grammatical features. In contrast, machine translation tends to be more conservative compared
to non-translations only in journalistic and fictional texts, and the degree of conservatism varies across machine translation
platforms. These findings suggest that human translators are more risk avoidant than original writers are, providing strong
support for the risk aversion hypothesis. Moreover, the lack of understanding of translation norms or standards in machine
translation, as well as the linguistic distinctions from human translation, implies the immense potential of future human-machine
collaborative translation models.
Article outline
- 1.Introduction
- 2.Defining conservatism and exploring human-machine translation differences
- 2.1The concept of conservatism revisited
- 2.2Linguistic distinctions between human and machine translation
- 2.3Research questions and hypotheses
- 3.Research design
- 3.1Corpora compilation
- 3.2Data extraction
- 3.3Multivariate analysis
- 4.Results and discussion
- 4.1Profile-based correspondence analysis
- 4.2Mixed-effects logistic regression analysis
- 4.3Discussion
- 5.Conclusion
- Acknowledgements
- Notes
References
References (89)
Aharoni, R., Koppel, M., & Goldberg, Y. (2014). Automatic
detection of machine translated text and translation quality
estimation. In K. Toutanova & H. Wu (Eds.) Proceedings
of the 52nd annual meeting of the Association for Computational
Linguistics (pp. 289–295). Association for Computational Linguistics.
Baker, M. (1993). Corpus
linguistics and translation studies: Implications and
applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text
and technology: In honour of John
Sinclair (pp. 233–250). John Benjamins.
(1996). Corpus-based
translation studies: The challenges that lie ahead. In H. Somers (Ed.), Terminology,
LSP and translation: Studies in language engineering in honour of Juan C.
Sager (pp. 175–186). John Benjamins.
Becher, V. (2010). Abandoning
the notion of “translation-inherent” explicitation: Against a dogma of translation
studies. Across Languages and
Cultures, 11(1), 1–28.
Bennett, K. (2009). English
academic style manuals: A survey. Journal of English for Academic
Purposes, 8(1), 43–54.
Bernardini, S., & Ferraresi, A. (2011). Practice,
description and theory come together — normalization or interference in Italian technical
translation? Meta, 56(2), 226–246.
Bizzoni, Y., Juzek, T. S., España-Bonet, C., Dutta Chowdhury, K., van Genabith, J., & Teich, E. (2020). How
human is machine translationese? Comparing human and machine translations of text and
speech. In M. Federico, A. Waibel, K. Knight, S. Nakamura, H. Ney, J. Niehues, S. Stüker, D. Wu, J. Mariani, & F. Yvon (Eds.), Proceedings
of the 17th international conference on spoken language
translation (pp. 280–290). Association for Computational Linguistics.
Blum-Kulka, S. (1986). Shifts
of cohesion and coherence in translation. In J. House & S. Blum-Kulka (Eds.), Interlingual
and intercultural communication: Discourse and cognition in translation and second language acquisition
studies (pp. 17–35). Gunter Narr Verlag.
Bresnan, J. (2021). Formal
grammar, usage probabilities, and auxiliary
contraction. Language, 97(1), 108–150.
Brezina, B., & Platt, W. (2024). #LancsBox
X [Computer software]. Lancaster University. [URL]
Bystrova-McIntyre, T. (2012). Cohesion
in translation: A corpus study of human-translated, machine-translated, and non-translated texts (Russian into
English). Doctoral dissertation, Kent State University.
Cappelle, B., & Loock, R. (2017). Typological
differences shining through: The case of phrasal verbs in translated
English. In G. De Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical
translation studies. New theoretical and methodological
traditions (pp. 235–264). Mouton de Gruyter.
Chesterman, A. (1997). Memes
of translation: The spread of ideas in translation theory. John Benjamins.
Chou, I., Li, W., & Liu, K. (2023). Representation
of interactional metadiscourse in translated and native English: A corpus-assisted study. PLOS
ONE, 18(7), e0284849.
Daugs, R. (2021). Investigating
the constructionhood of English modal contractions from a diachronic perspective: Contractions, constructions and
constructional change. In M. Hilpert, B. Cappelle, & I. Depraetere (Eds.), Modality
and diachronic construction
grammar (pp. 13–52). John Benjamins.
De Clercq, O., De Sutter, G., Loock, R., Cappelle, B., & Plevoets, K. (2021). Uncovering
machine translationese using corpus analysis techniques to distinguish between original and machine-translated
French. Translation
Quarterly, 1011, 21–45.
De Sutter, G., Delaere, I., & Plevoets, K. (2012). Lexical
lectometry in corpus-based translation studies. In Michael P. Oakes & M. Ji (Eds.), Quantitative
methods in corpus-based translation
studies (pp. 325–346). John Benjamins.
De Sutter, G., & Lefer, M.-A. (2020). On
the need for a new research agenda for corpus-based translation studies: A multi-methodological, multifactorial and
interdisciplinary
approach. Perspectives, 28(1), 1–23.
Delaere, I., & De Sutter, G. (2013). Applying
a multidimensional, register-sensitive approach to visualize normalization in translated and non-translated
Dutch. Belgian Journal of
Linguistics, 27(1), 43–60.
(2017). Variability
of English loanword use in Belgian Dutch translations. Measuring the effect of source language and
register. In G. D. Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical
translation studies: New methodological and theoretical
traditions (pp. 81–112). De Gruyter Mouton.
Delaere, I., De Sutter, G., & Plevoets, K. (2012). Is
translated language more standardized than non-translated language?: Using profile-based correspondence analysis for measuring
linguistic distances between language varieties. Target. International Journal of Translation
Studies, 24(2), 203–224.
Dixon, T. (2022). Proscribed
informality features in published research: A corpus analysis. English for Specific
Purposes, 651, 63–78.
Dixon, T., Egbert, J., Larsson, T., Kaatari, H., & Hanks, E. (2023). Toward
an empirical understanding of formality: Triangulating corpus data with teacher
perceptions. English for Specific
Purposes, 711, 161–177.
Doherty, S. (2016). The
impact of translation technologies on the process and product of translation. International
Journal of
Communication, 101, 947–969. [URL]
Evert, S., & Neumann, S. (2017). The
impact of translation direction on characteristics of translated texts. A multivariate analysis for English and
German. In G. De Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical
translation studies: New methodological and theoretical
traditions (pp. 47–80). De Gruyter Mouton.
Geeraerts, D., Speelman, D., Heylen, K., Montes, M., De Pascale, S., Franco, K., & Lang, M. (2023). Lectometry
step by step. In D. Geeraerts, D. Speelman, K. Heylen, M. Montes, S. De Pascale, K. Franco, & M. Lang (Eds.), Lexical
variation and
change (pp. 203–224). Oxford University Press.
Graham, Y., Haddow, B., & Koehn, P. (2020). Statistical
power and translationese in machine translation evaluation. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings
of the 2020 conference on Empirical Methods in Natural Language Processing
(EMNLP) (pp. 72–81). Association for Computational Linguistics.
Gries, S. Th. (2015). The most under-used
statistical method in corpus linguistics: Multi-level (and mixed-effects)
models. Corpora, 10(1), 95–125.
Hu, X., Xiao, R., & Hardie, A. (2019). How
do English translations differ from non-translated English writings? A multi-feature statistical model for linguistic
variation analysis. Corpus Linguistics and Linguistic
Theory, 15(2), 347–382.
Huddleston, R., & Pullum, G. K. (2002). The
Cambridge grammar of the English language. Cambridge University Press.
Hyland, K., & Jiang, F. (2017). Is
academic writing becoming more informal? English for Specific
Purposes, 451, 40–51.
Jiang, Y., & Niu, J. (2022). A
corpus-based search for machine translationese in terms of discourse coherence. Across
Languages and
Cultures, 23(2), 148–166.
Konovalova, A., & Toral, A. (2022). Man
vs. machine: Extracting character networks from human and machine
translations. In S. Degaetano, A. Kazantseva, N. Reiter, & S. Szpakowicz (Eds.), Proceedings
of the 6th joint SIGHUM workshop on computational linguistics for cultural heritage, social sciences, humanities and
literature (pp. 75–82). International Conference on Computational Linguistics. [URL]
Koponen, M. (2016). Is
machine translation post-editing worth the effort?: A survey of research into post-editing and
effort. The Journal of Specialised
Translation, 251, 131–148. [URL].
Kruger, H. (2019). That
again: A multivariate analysis of the factors conditioning syntactic explicitness in translated
English. Across Languages and
Cultures, 20(1), 1–33.
Kruger, H., & De Sutter, G. (2018). Alternations
in contact and non-contact varieties: Reconceptualising that-omission in translated and non-translated English using the
MuPDAR approach. Translation, Cognition &
Behavior, 1(2), 251–290.
Krüger, R. (2020). Explicitation
in neural machine translation. Across Languages and
Cultures, 21(2), 195–216.
Kuo, C. (2019). Function
words in statistical machine-translated Chinese and original Chinese: A study into the translationese of machine translation
systems. Digital Scholarship in the
Humanities, 34(4), 752–771.
Lapshinova-Koltunski, E. (2015). Variation
in translation: Evidence from corpora. In C. Fantinuoli & F. Zanettin (Eds.), New
directions in corpus-based translation
studies (pp. 93–114). Language Science Press.
(2017). Exploratory
analysis of dimensions influencing variation in translation. The case of text register and translation
method. In G. De Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical
translation studies: New methodological and theoretical
traditions (pp. 207–234). De Gruyter Mouton.
(2022). Detecting
normalisation and shining-through in novice and professional
translations. In S. Granger & Marie-Aude Lefer (Eds.), Extending
the scope of corpus-based translation
studies (pp. 182–206). Bloomsbury.
Larsonneur, C. (2021). Neural
machine translation: From commodity to commons? In R. Desjardins, C. Larsonneur, & P. Lacour (Eds.), When
translation goes digital: Case studies and critical
reflections (pp. 257–280). Springer.
Leech, G., Hundt, M., Mair, C., & Smith, N. (2009). Change
in contemporary English: A grammatical study. Cambridge University Press.
Leppihalme, R. (2000). The
two faces of standardization: On the translation of regionalisms in literary dialogue. The
Translator, 6(2), 247–269.
Li, H., Graesser, A. C., & Cai, Z. (2014). Comparison
of Google translation with human translation. In W. Eberle & C. Boonthum-Denecke (Eds.), Proceedings
of the twenty-seventh international Florida artificial intelligence research society
conference (pp. 190–195). Association for the Advancement of Artificial Intelligence. [URL]
Liardét, C. L., Black, S., & Bardetta, V. S. (2019). Defining
formality: Adapting to the abstract demands of academic discourse. Journal of English for
Academic
Purposes, 381, 146–158.
Liu, K., & Afzaal, M. (2021). Syntactic
complexity in translated and non-translated texts: A corpus-based study of simplification. PLOS
ONE, 16(6), e0253454.
Luo, J., & Li, D. (2022). Universals
in machine translation?: A corpus-based study of Chinese-English translations by WeChat
Translate. International Journal of Corpus
Linguistics, 27(1), 31–58.
Mair, C. (2015). Parallel
corpora. A real-time approach to the study of language change in
progress. Diacronia, 2015(1), Article
1.
Mair, C., & Hundt, M. (1995). Why
is the progressive becoming more frequent in English? A corpus-based investigation of language change in
progress. Zeitschrift Für Anglistik Und
Amerikanistik, 43(2), 111–122.
Malmkjær, K. (1997). Punctuation
in Hans Christian Andersen’s stories and in their translations into
English. In F. Poyatos (Ed.), Nonverbal
communication and
translation (pp. 151–162). John Benjamins.
Mauranen, A. (2007). Universal
tendencies in translation. In G. Anderman & M. Rogers (Eds.), Incorporating
corpora: The linguist and the
translator (pp. 32–48). Multilingual Matters.
May, R. (1997). Sensible
elocution: How translation works in & upon punctuation. The
Translator, 3(1), 1–20.
Mouratidis, D., Stasimioti, M., Sosoni, V., & Kermanidis, K. L. (2021). NoDeeLe:
A novel deep learning schema for evaluating neural machine translation
systems. In R. Mitkov, V. Sosoni, J. C. Giguère, E. Murgolo, & E. Deysel (Eds.), Proceedings
of the translation and interpreting technology online
conference (pp. 37–47). INCOMA Ltd. [URL].
Niu, J., & Jiang, Y. (2024). Does
simplification hold true for machine translations? A corpus-based analysis of lexical diversity in text varieties across
genres. Humanities and Social Sciences
Communications, 11(1), 1–10.
O’Brien, S. (2020). Translation,
human–computer interaction and cognition. In F. Alves & A. Jakobsen (Eds.), The
Routledge handbook of translation and
cognition (pp. 376–388). Routledge.
Olohan, M. (2003). How
frequent are the contractions?: A study of contracted forms in the Translational English
corpus. Target. International Journal of Translation
Studies, 15(1), 59–89.
Olohan, M., & Baker, M. (2000). Reporting
that in translated English. Evidence for subconscious processes of explicitation? Across
Languages and
Cultures, 1(2), 141–158.
Øverås, L. (1998). In
search of the Third Code: An investigation of norms in literary
translation. Meta, 43(4), 557–570.
Pedersen, J. (2017). How
metaphors are rendered in subtitles. Target. International Journal of Translation
Studies, 29(3), 416–439.
Plevoets, K. (2008). Tussen
spreek-en standaardtaal: Een corpusgebaseerd onderzoek naar de situationele, regionale en sociale verspreiding van enkele
morfo-syntactische verschijnselen uit het gesproken Belgisch-Nederlands. Unpublished PhD
thesis. Katholieke Universiteit Leuven. [URL]
(2020). Lectometry
and latent variables: A model for underlying determinants of (normative) choices in written and audiovisual
translations. Zeitschrift Für Dialektologie Und
Linguistik, 87(2), 144–172.
Popovic, M., Lapshinova-Koltunski, E., & Koponen, M. (2023). Computational
analysis of different translations: By professionals, students and
machines. In M. Nurminen, J. Brenner, M. Koponen, S. Latomaa, M. Mikhailov, F. Schierl, T. Ranasinghe, E. Vanmassenhove, S. A. Vidal, N. Aranberri, M. Nunziatini, C. P. Escartín, M. Forcada, M. Popovic, C. Scarton, & H. Moniz (Eds.), Proceedings
of the 24th annual conference of the European association for machine
translation (pp. 365–374). European Association for Machine Translation. [URL]
Prieels, L., & De Sutter, G. (2018). Between
language policy and language reality: A corpus-based multivariate study of the interlingual and intralingual subtitling
practice in
Flanders. Perspectives, 26(3), 322–343.
Pym, A. (2011). What
technology does to translating. Translation & Interpreting: The International Journal of
Translation and Interpreting
Research, 3(1), 1–9.
(2020). Translation,
risk management and cognition. In F. Alves & A. L. Jakobsen (Eds.), The
Routledge handbook of translation and
cognition (pp. 445–458). Routledge.
R Core Team. (2023). R: A language and
environment for statistical computing [Computer software]. R Foundation for Statistical Computing. 〈[URL]〉
Redelinghuys, K. (2016). Levelling-out
and register variation in the translations of experienced and inexperienced translators: A corpus-based
study. Stellenbosch Papers in
Linguistics, 451, 189–220.
Redelinghuys, K., & Kruger, H. (2015). Using
the features of translated language to investigate translation expertise: A corpus-based
study. International Journal of Corpus
Linguistics, 20(3), 293–325.
Ruette, T., Ehret, K., & Szmrecsanyi, B. (2016). A
lectometric analysis of aggregated lexical variation in written Standard English with semantic vector space
models. International Journal of Corpus
Linguistics, 21(1), 48–79.
Scott, M. N. (1998). Normalisation
and readers’ expectations: A study of literary translation with reference to Lispector’s A Hora da
Estrela. Unpublished PhD thesis. University of Liverpool.
Sela-Sheffy, R. (2005). How
to be a (recognized) translator: Rethinking habitus, norms, and the field of
translation. Target. International Journal of Translation
Studies, 17(1), 1–26.
Speelman, D., Grondelaers, S., & Geeraerts, D. (2003). Profile-based
linguistic uniformity as a generic method for comparing language varieties. Computers and the
Humanities, 37(3), 317–337.
Stewart, D. (2000). Conventionality,
creativity and translated text: The implications of electronic corpora in
translation. In M. Olohan (Ed.), Intercultural
faultlines: Research models in translation studies: V. 1: Textual and cognitive
Aspects (pp. 73–91). Routledge.
Teich, E. (2003). Cross-linguistic
variation in system and text: A methodology for the investigation of translations and comparable
texts. Mouton de Gruyter.
Tirkkonen-Condit, S. (2004). Unique
items – Over- or under-represented in translated language? In A. Mauranen & P. Kujamäki (Eds.), Translation
universals: Do they
exist? (pp. 177–184). John Benjamins.
Toury, G. (1995). Descriptive
translation studies — and beyond. John Benjamins.
Vanderauwera, R. (2022). Dutch
novels translated into English: The transformation of a minority
literature. BRILL.
Vanmassenhove, E., Shterionov, D., & Gwilliam, M. (2021). Machine
translationese: Effects of algorithmic bias on linguistic complexity in machine
translation. In P. Merlo, J. Tiedemann, & R. Tsarfaty (Eds.), Proceedings
of the 16th conference of the European chapter of the association for computational
linguistics (pp. 2203–2213). Association for Computational Linguistics.
Vanmassenhove, E., Shterionov, D., & Way, A. (2019). Lost
in translation: Loss and decay of linguistic richness in machine
translation. In M. Forcada, A. Way, B. Haddow, & R. Sennrich (Eds.), Proceedings
of machine translation summit XVII: Research
track (pp. 222–232). European Association for Machine Translation. [URL]
Xiao, R. (2010). How
different is translated Chinese from native Chinese?: A corpus-based study of translation
universals. International Journal of Corpus
Linguistics, 15(1), 5–35.
Yaeger-Dror, M., Hall-Lew, L., & Deckert, S. (2002). It’s
not or isn’t it? Using large corpora to determine the influences on contraction
strategies. Language Variation and
Change, 14(1), 79–118.
Ziganshina, L. E., Yudina, E. V., Gabdrakhmanov, A. I., & Ried, J. (2021). Assessing
human post-editing efforts to compare the performance of three machine translation engines for English to Russian translation
of Cochrane plain language health information: Results of a randomised
comparison. Informatics, 8(1), 9.