Is human translation more conservative than machine translation?: A corpus-based study measuring formality across translation varieties and registers

Li, Jia; Hu, Xianyao

doi:10.1075/ijcl.24048.li

Article published In: International Journal of Corpus Linguistics: Online-First Articles

Get fulltext from our e-platform

Download PDF

Download EPUB

Is human translation more conservative than machine translation?

A corpus-based study measuring formality across translation varieties and registers

Jia Li | Southwest University

Xianyao Hu | Southwest University

Published online: 14 November 2025

https://doi.org/10.1075/ijcl.24048.li

Abstract

The present study investigates whether conservatism exists in human- and machine-translated texts from Chinese into English, and whether this tendency is consistently observable across different registers and multiple lexico-grammatical features by applying profile-based correspondence analysis and mixed-effects logistic regression modelling. The results reveal that human translation is characterised by a higher level of conservatism than both machine translation and original writing, irrespective of registers and lexico-grammatical features. In contrast, machine translation tends to be more conservative compared to non-translations only in journalistic and fictional texts, and the degree of conservatism varies across machine translation platforms. These findings suggest that human translators are more risk avoidant than original writers are, providing strong support for the risk aversion hypothesis. Moreover, the lack of understanding of translation norms or standards in machine translation, as well as the linguistic distinctions from human translation, implies the immense potential of future human-machine collaborative translation models.

Keywords: conservatism, machine translation, profile-based correspondence analysis, mixed-effects logistic regression, risk aversion

Article outline

1.Introduction
2.Defining conservatism and exploring human-machine translation differences
- 2.1The concept of conservatism revisited
- 2.2Linguistic distinctions between human and machine translation
- 2.3Research questions and hypotheses
3.Research design
- 3.1Corpora compilation
- 3.2Data extraction
- 3.3Multivariate analysis
4.Results and discussion
- 4.1Profile-based correspondence analysis
- 4.2Mixed-effects logistic regression analysis
- 4.3Discussion
5.Conclusion
Acknowledgements
Notes
References

References (89)

References

Aharoni, R., Koppel, M., & Goldberg, Y. (2014). Automatic detection of machine translated text and translation quality estimation. In K. Toutanova & H. Wu (Eds.) Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (pp. 289–295). Association for Computational Linguistics.

Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honour of John Sinclair (pp. 233–250). John Benjamins.

(1996). Corpus-based translation studies: The challenges that lie ahead. In H. Somers (Ed.), Terminology, LSP and translation: Studies in language engineering in honour of Juan C. Sager (pp. 175–186). John Benjamins.

Becher, V. (2010). Abandoning the notion of “translation-inherent” explicitation: Against a dogma of translation studies. Across Languages and Cultures, 11(1), 1–28.

Bennett, K. (2009). English academic style manuals: A survey. Journal of English for Academic Purposes, 8(1), 43–54.

Bernardini, S., & Ferraresi, A. (2011). Practice, description and theory come together — normalization or interference in Italian technical translation? Meta, 56(2), 226–246.

Biber, D. (1988). Variation across speech and writing. Cambridge University Press.

Bizzoni, Y., Juzek, T. S., España-Bonet, C., Dutta Chowdhury, K., van Genabith, J., & Teich, E. (2020). How human is machine translationese? Comparing human and machine translations of text and speech. In M. Federico, A. Waibel, K. Knight, S. Nakamura, H. Ney, J. Niehues, S. Stüker, D. Wu, J. Mariani, & F. Yvon (Eds.), Proceedings of the 17th international conference on spoken language translation (pp. 280–290). Association for Computational Linguistics.

Blum-Kulka, S. (1986). Shifts of cohesion and coherence in translation. In J. House & S. Blum-Kulka (Eds.), Interlingual and intercultural communication: Discourse and cognition in translation and second language acquisition studies (pp. 17–35). Gunter Narr Verlag.

Bresnan, J. (2021). Formal grammar, usage probabilities, and auxiliary contraction. Language, 97(1), 108–150.

Brezina, B., & Platt, W. (2024). #LancsBox X [Computer software]. Lancaster University. [URL]

Bystrova-McIntyre, T. (2012). Cohesion in translation: A corpus study of human-translated, machine-translated, and non-translated texts (Russian into English). Doctoral dissertation, Kent State University.

Cappelle, B., & Loock, R. (2017). Typological differences shining through: The case of phrasal verbs in translated English. In G. De Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical translation studies. New theoretical and methodological traditions (pp. 235–264). Mouton de Gruyter.

Chesterman, A. (1997). Memes of translation: The spread of ideas in translation theory. John Benjamins.

Chou, I., Li, W., & Liu, K. (2023). Representation of interactional metadiscourse in translated and native English: A corpus-assisted study. PLOS ONE, 18(7), e0284849.

Curzan, A. (2014). Fixing English: Prescriptivism and language history. Cambridge University Press.

Daugs, R. (2021). Investigating the constructionhood of English modal contractions from a diachronic perspective: Contractions, constructions and constructional change. In M. Hilpert, B. Cappelle, & I. Depraetere (Eds.), Modality and diachronic construction grammar (pp. 13–52). John Benjamins.

De Clercq, O., De Sutter, G., Loock, R., Cappelle, B., & Plevoets, K. (2021). Uncovering machine translationese using corpus analysis techniques to distinguish between original and machine-translated French. Translation Quarterly, 1011, 21–45.

De Sutter, G., Delaere, I., & Plevoets, K. (2012). Lexical lectometry in corpus-based translation studies. In Michael P. Oakes & M. Ji (Eds.), Quantitative methods in corpus-based translation studies (pp. 325–346). John Benjamins.

De Sutter, G., & Lefer, M.-A. (2020). On the need for a new research agenda for corpus-based translation studies: A multi-methodological, multifactorial and interdisciplinary approach. Perspectives, 28(1), 1–23.

Delaere, I., & De Sutter, G. (2013). Applying a multidimensional, register-sensitive approach to visualize normalization in translated and non-translated Dutch. Belgian Journal of Linguistics, 27(1), 43–60.

(2017). Variability of English loanword use in Belgian Dutch translations. Measuring the effect of source language and register. In G. D. Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical translation studies: New methodological and theoretical traditions (pp. 81–112). De Gruyter Mouton.

Delaere, I., De Sutter, G., & Plevoets, K. (2012). Is translated language more standardized than non-translated language?: Using profile-based correspondence analysis for measuring linguistic distances between language varieties. Target. International Journal of Translation Studies, 24(2), 203–224.

Dixon, T. (2022). Proscribed informality features in published research: A corpus analysis. English for Specific Purposes, 651, 63–78.

Dixon, T., Egbert, J., Larsson, T., Kaatari, H., & Hanks, E. (2023). Toward an empirical understanding of formality: Triangulating corpus data with teacher perceptions. English for Specific Purposes, 711, 161–177.

Doherty, S. (2016). The impact of translation technologies on the process and product of translation. International Journal of Communication, 101, 947–969. [URL]

Evert, S., & Neumann, S. (2017). The impact of translation direction on characteristics of translated texts. A multivariate analysis for English and German. In G. De Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical translation studies: New methodological and theoretical traditions (pp. 47–80). De Gruyter Mouton.

Geeraerts, D., Speelman, D., Heylen, K., Montes, M., De Pascale, S., Franco, K., & Lang, M. (2023). Lectometry step by step. In D. Geeraerts, D. Speelman, K. Heylen, M. Montes, S. De Pascale, K. Franco, & M. Lang (Eds.), Lexical variation and change (pp. 203–224). Oxford University Press.

Graham, Y., Haddow, B., & Koehn, P. (2020). Statistical power and translationese in machine translation evaluation. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 72–81). Association for Computational Linguistics.

Greenacre, M. (2016). Correspondence analysis in practice (3rd ed.). Chapman and Hall/CRC.

Grellier, J., & Goerke, V. (2018). Communications toolkit. Cengage AU.

Gries, S. Th. (2015). The most under-used statistical method in corpus linguistics: Multi-level (and mixed-effects) models. Corpora, 10(1), 95–125.

Hu, X., Xiao, R., & Hardie, A. (2019). How do English translations differ from non-translated English writings? A multi-feature statistical model for linguistic variation analysis. Corpus Linguistics and Linguistic Theory, 15(2), 347–382.

Huddleston, R., & Pullum, G. K. (2002). The Cambridge grammar of the English language. Cambridge University Press.

Hyland, K., & Jiang, F. (2017). Is academic writing becoming more informal? English for Specific Purposes, 451, 40–51.

Jiang, Y., & Niu, J. (2022). A corpus-based search for machine translationese in terms of discourse coherence. Across Languages and Cultures, 23(2), 148–166.

Kenny, D. (2001). Lexis and creativity in translation: A corpus based approach. Routledge.

Konovalova, A., & Toral, A. (2022). Man vs. machine: Extracting character networks from human and machine translations. In S. Degaetano, A. Kazantseva, N. Reiter, & S. Szpakowicz (Eds.), Proceedings of the 6th joint SIGHUM workshop on computational linguistics for cultural heritage, social sciences, humanities and literature (pp. 75–82). International Conference on Computational Linguistics. [URL]

Koponen, M. (2016). Is machine translation post-editing worth the effort?: A survey of research into post-editing and effort. The Journal of Specialised Translation, 251, 131–148. [URL].

Kruger, H. (2019). That again: A multivariate analysis of the factors conditioning syntactic explicitness in translated English. Across Languages and Cultures, 20(1), 1–33.

Kruger, H., & De Sutter, G. (2018). Alternations in contact and non-contact varieties: Reconceptualising that-omission in translated and non-translated English using the MuPDAR approach. Translation, Cognition & Behavior, 1(2), 251–290.

Krüger, R. (2020). Explicitation in neural machine translation. Across Languages and Cultures, 21(2), 195–216.

Kuo, C. (2019). Function words in statistical machine-translated Chinese and original Chinese: A study into the translationese of machine translation systems. Digital Scholarship in the Humanities, 34(4), 752–771.

Lapshinova-Koltunski, E. (2015). Variation in translation: Evidence from corpora. In C. Fantinuoli & F. Zanettin (Eds.), New directions in corpus-based translation studies (pp. 93–114). Language Science Press.

(2017). Exploratory analysis of dimensions influencing variation in translation. The case of text register and translation method. In G. De Sutter, M.-A. Lefer, & I. Delaere (Eds.), Empirical translation studies: New methodological and theoretical traditions (pp. 207–234). De Gruyter Mouton.

(2022). Detecting normalisation and shining-through in novice and professional translations. In S. Granger & Marie-Aude Lefer (Eds.), Extending the scope of corpus-based translation studies (pp. 182–206). Bloomsbury.

Larsonneur, C. (2021). Neural machine translation: From commodity to commons? In R. Desjardins, C. Larsonneur, & P. Lacour (Eds.), When translation goes digital: Case studies and critical reflections (pp. 257–280). Springer.

Leech, G., Hundt, M., Mair, C., & Smith, N. (2009). Change in contemporary English: A grammatical study. Cambridge University Press.

Leppihalme, R. (2000). The two faces of standardization: On the translation of regionalisms in literary dialogue. The Translator, 6(2), 247–269.

Li, H., Graesser, A. C., & Cai, Z. (2014). Comparison of Google translation with human translation. In W. Eberle & C. Boonthum-Denecke (Eds.), Proceedings of the twenty-seventh international Florida artificial intelligence research society conference (pp. 190–195). Association for the Advancement of Artificial Intelligence. [URL]

Liardét, C. L., Black, S., & Bardetta, V. S. (2019). Defining formality: Adapting to the abstract demands of academic discourse. Journal of English for Academic Purposes, 381, 146–158.

Liu, K., & Afzaal, M. (2021). Syntactic complexity in translated and non-translated texts: A corpus-based study of simplification. PLOS ONE, 16(6), e0253454.

Luo, J., & Li, D. (2022). Universals in machine translation?: A corpus-based study of Chinese-English translations by WeChat Translate. International Journal of Corpus Linguistics, 27(1), 31–58.

Mair, C. (2015). Parallel corpora. A real-time approach to the study of language change in progress. Diacronia, 2015(1), Article 1.

Mair, C., & Hundt, M. (1995). Why is the progressive becoming more frequent in English? A corpus-based investigation of language change in progress. Zeitschrift Für Anglistik Und Amerikanistik, 43(2), 111–122.

Malmkjær, K. (1997). Punctuation in Hans Christian Andersen’s stories and in their translations into English. In F. Poyatos (Ed.), Nonverbal communication and translation (pp. 151–162). John Benjamins.

Mauranen, A. (2007). Universal tendencies in translation. In G. Anderman & M. Rogers (Eds.), Incorporating corpora: The linguist and the translator (pp. 32–48). Multilingual Matters.

May, R. (1997). Sensible elocution: How translation works in & upon punctuation. The Translator, 3(1), 1–20.

Mouratidis, D., Stasimioti, M., Sosoni, V., & Kermanidis, K. L. (2021). NoDeeLe: A novel deep learning schema for evaluating neural machine translation systems. In R. Mitkov, V. Sosoni, J. C. Giguère, E. Murgolo, & E. Deysel (Eds.), Proceedings of the translation and interpreting technology online conference (pp. 37–47). INCOMA Ltd. [URL].

Niu, J., & Jiang, Y. (2024). Does simplification hold true for machine translations? A corpus-based analysis of lexical diversity in text varieties across genres. Humanities and Social Sciences Communications, 11(1), 1–10.

O’Brien, S. (2020). Translation, human–computer interaction and cognition. In F. Alves & A. Jakobsen (Eds.), The Routledge handbook of translation and cognition (pp. 376–388). Routledge.

Olohan, M. (2003). How frequent are the contractions?: A study of contracted forms in the Translational English corpus. Target. International Journal of Translation Studies, 15(1), 59–89.

Olohan, M., & Baker, M. (2000). Reporting that in translated English. Evidence for subconscious processes of explicitation? Across Languages and Cultures, 1(2), 141–158.

Øverås, L. (1998). In search of the Third Code: An investigation of norms in literary translation. Meta, 43(4), 557–570.

Pedersen, J. (2017). How metaphors are rendered in subtitles. Target. International Journal of Translation Studies, 29(3), 416–439.

Plevoets, K. (2008). Tussen spreek-en standaardtaal: Een corpusgebaseerd onderzoek naar de situationele, regionale en sociale verspreiding van enkele morfo-syntactische verschijnselen uit het gesproken Belgisch-Nederlands. Unpublished PhD thesis. Katholieke Universiteit Leuven. [URL]

(2020). Lectometry and latent variables: A model for underlying determinants of (normative) choices in written and audiovisual translations. Zeitschrift Für Dialektologie Und Linguistik, 87(2), 144–172.

Popovic, M., Lapshinova-Koltunski, E., & Koponen, M. (2023). Computational analysis of different translations: By professionals, students and machines. In M. Nurminen, J. Brenner, M. Koponen, S. Latomaa, M. Mikhailov, F. Schierl, T. Ranasinghe, E. Vanmassenhove, S. A. Vidal, N. Aranberri, M. Nunziatini, C. P. Escartín, M. Forcada, M. Popovic, C. Scarton, & H. Moniz (Eds.), Proceedings of the 24th annual conference of the European association for machine translation (pp. 365–374). European Association for Machine Translation. [URL]

Prieels, L., & De Sutter, G. (2018). Between language policy and language reality: A corpus-based multivariate study of the interlingual and intralingual subtitling practice in Flanders. Perspectives, 26(3), 322–343.

Pym, A. (2011). What technology does to translating. Translation & Interpreting: The International Journal of Translation and Interpreting Research, 3(1), 1–9.

(2015). Translating as risk management. Journal of Pragmatics, 851, 67–80.

(2020). Translation, risk management and cognition. In F. Alves & A. L. Jakobsen (Eds.), The Routledge handbook of translation and cognition (pp. 445–458). Routledge.

R Core Team. (2023). R: A language and environment for statistical computing [Computer software]. R Foundation for Statistical Computing. 〈[URL]〉

Redelinghuys, K. (2016). Levelling-out and register variation in the translations of experienced and inexperienced translators: A corpus-based study. Stellenbosch Papers in Linguistics, 451, 189–220.

Redelinghuys, K., & Kruger, H. (2015). Using the features of translated language to investigate translation expertise: A corpus-based study. International Journal of Corpus Linguistics, 20(3), 293–325.

Ruette, T., Ehret, K., & Szmrecsanyi, B. (2016). A lectometric analysis of aggregated lexical variation in written Standard English with semantic vector space models. International Journal of Corpus Linguistics, 21(1), 48–79.

Scott, M. N. (1998). Normalisation and readers’ expectations: A study of literary translation with reference to Lispector’s A Hora da Estrela. Unpublished PhD thesis. University of Liverpool.

Sela-Sheffy, R. (2005). How to be a (recognized) translator: Rethinking habitus, norms, and the field of translation. Target. International Journal of Translation Studies, 17(1), 1–26.

Speelman, D., Grondelaers, S., & Geeraerts, D. (2003). Profile-based linguistic uniformity as a generic method for comparing language varieties. Computers and the Humanities, 37(3), 317–337.

Stewart, D. (2000). Conventionality, creativity and translated text: The implications of electronic corpora in translation. In M. Olohan (Ed.), Intercultural faultlines: Research models in translation studies: V. 1: Textual and cognitive Aspects (pp. 73–91). Routledge.

Teich, E. (2003). Cross-linguistic variation in system and text: A methodology for the investigation of translations and comparable texts. Mouton de Gruyter.

Tirkkonen-Condit, S. (2004). Unique items – Over- or under-represented in translated language? In A. Mauranen & P. Kujamäki (Eds.), Translation universals: Do they exist? (pp. 177–184). John Benjamins.

Toury, G. (1995). Descriptive translation studies — and beyond. John Benjamins.

Vanderauwera, R. (2022). Dutch novels translated into English: The transformation of a minority literature. BRILL.

Vanmassenhove, E., Shterionov, D., & Gwilliam, M. (2021). Machine translationese: Effects of algorithmic bias on linguistic complexity in machine translation. In P. Merlo, J. Tiedemann, & R. Tsarfaty (Eds.), Proceedings of the 16th conference of the European chapter of the association for computational linguistics (pp. 2203–2213). Association for Computational Linguistics.

Vanmassenhove, E., Shterionov, D., & Way, A. (2019). Lost in translation: Loss and decay of linguistic richness in machine translation. In M. Forcada, A. Way, B. Haddow, & R. Sennrich (Eds.), Proceedings of machine translation summit XVII: Research track (pp. 222–232). European Association for Machine Translation. [URL]

Xiao, R. (2010). How different is translated Chinese from native Chinese?: A corpus-based study of translation universals. International Journal of Corpus Linguistics, 15(1), 5–35.

Yaeger-Dror, M., Hall-Lew, L., & Deckert, S. (2002). It’s not or isn’t it? Using large corpora to determine the influences on contraction strategies. Language Variation and Change, 14(1), 79–118.

Ziganshina, L. E., Yudina, E. V., Gabdrakhmanov, A. I., & Ried, J. (2021). Assessing human post-editing efforts to compare the performance of three machine translation engines for English to Russian translation of Cochrane plain language health information: Results of a randomised comparison. Informatics, 8(1), 9.