Article published In: International Journal of Corpus Linguistics
Vol. 27:1 (2022) ► pp.31–58
Universals in machine translation?
A corpus-based study of Chinese-English translations by WeChat Translate
Published online: 14 February 2022
https://doi.org/10.1075/ijcl.19127.luo
https://doi.org/10.1075/ijcl.19127.luo
Abstract
By examining and comparing the linguistic patterns in a self-built corpus of Chinese-English translations produced by WeChat Translate, the latest online machine translation app from the most popular social media platform (WeChat) in China, this study explores such questions as whether or not and to what extent simplification and normalization (hypothesized Translation Universals) exhibit themselves in these translations. The results show that, whereas simplification cannot be substantiated, the tendency of normalization to occur in the WeChat translations can be confirmed. The research finds that these results are caused by the operating mechanism of machine translation (MT) systems. Certain salient words tend to prime WeChat’s MT system to repetitively resort to typical language patterns, which leads to a significant overuse of lexical chunks. It is hoped that the present study can shed new light on the development of MT systems and encourage more corpus-based product-oriented research on MT.
Article outline
- 1.Introduction
- 2.Translation Universals
- 3.Corpora and methodology
- 3.1Corpora compilation
- 3.2Methodology
- 4.Findings and discussion
- 4.1Results for simplification
- 4.2Results for normalization
- 4.2.1NV Ratio
- 4.2.2Correction of punctuation
- 4.2.3The overuse of typical grammatical patterns
- 4.3Discussion
- 5.Conclusion
- Acknowledgements
References
References (52)
Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and Technology: In Honour of John Sinclair (pp. 233–250). John Benjamins.
(1996). Corpus-based translation studies: The challenges that lie ahead. In H. Somers (Ed.), In Terminology, LSP and Translation: Studies in Language Engineering, in Honour of Juan C. Sager (pp. 175–186). John Benjamins.
Bernardini, S., & Ferraresi, A. (2011). Practice, description and theory come together: Normalization or interference in Italian technical translation? Meta, 56(2), 226–246.
Bernardini, S., Ferraresi, A., & Miličević, M. (2016). From EPIC to EPTIC: Exploring simplification in interpreting and translation from an intermodal perspective. Target, 28(1), 61–86.
Blum-Kulka, S., & Levenston, E. (1983). Universals of lexical simplification. In C. Faerch & C. Gabriele (Eds.), Strategies in Inter-language Communication (pp. 119–139). Longman.
Cappelle, B., & Loock, R. (2017). Typological differences shining through: The case of phrasal verbs in translated English. In G. De Sutter, M. A. Lefer, & I. Delaere (Eds.), Empirical Translation Studies: New Theoretical and Methodological Traditions (pp. 235–264). Mouton de Gruyter.
Chesterman, A. (2004). Beyond the particular. In A. Mauranen & P. Kujamäki (Eds), Translation Universals: Do They Exist? (pp. 33–50). John Benjamins.
Feng, Z. (2018). Parallel development of machine translation and artificial intelligence. Journal of Foreign Language, 41(6), 35–48.
Grabowski, L. (2013). Interfacing corpus linguistics and computational stylistics: Translation universals in translational literary Polish. International Journal of Corpus Linguistics, 18(2), 254–280.
Gupta, A. (2021). User-controlled content translation in social media. In 26th International Conference on Intelligent User Interfaces – Companion (pp. 96–98). Association for Computing Machinery.
Halverson, S. (2003). The cognitive basis of Translation Universals. Target, 15(2), 197–241.
Hu, K. (2016). Introducing Corpus-Based Translation Studies. Springer; Shanghai Jiao Tong University Press.
Hu, K., & Li, Y. (2016). Features of machine translation and its relations with human translation. Chinese Translators Journal, 51, 6–14.
Hu, X., & Zeng, J. (2011). 从“把”字句看翻译汉语的杂合特征 [Hybridization of translated Chinese as observed in the use of “ba” constructions]. Foreign Language Research, 1301, 69–75.
Kajzer-Wietrzny, M. (2015). Simplification in interpreting and translation. Across Languages and Cultures, 16(2), 233–255.
Kruger, H., & Rooy, B. (2012). Register and the features of translated language. Across Languages and Cultures, 13(1), 33–65.
Lapshinova-Koltunski, E. (2015). Variation in translation: Evidence from corpora. In C. Fantinuoli & F. Zanetti (Eds.), New Directions in Corpus-Based Translation Studies (pp. 93–114). Language Science Press.
Laviosa, S. (1998). Core patterns of lexical use in a comparable corpus of English narrative prose. Meta, 43(4), 557–570.
(2006). Data-driven learning for translating anglicisms in business communication. IEEE Transactions on Professional Communication, 49(3), 267–274.
(2010). Corpus-based translation studies 15 years on: Theory, findings, applications. Synaps, 241(2010), 3–12.
(2011). Corpus-based translation studies: Where does it come from? Where is it going? In A. Kruger, K. Wallmach, & J. Munday (Eds.), Corpus-based Translation Studies: Research and Applications (pp. 13–32). Continuum.
Liu, F. (2019, Oct. 24). 微信牵手网易有道 [WeChat and Youdao in Cooperation]. ZOL Soft. [URL]
Rayson, P. (2008). From key words to key semantic domains. International Journal of Corpus Linguistics, 13(4), 519–549.
Sánchez-Moya, A., & Cruz-Moya, O. (2015). “Hey there! I am using WhatsApp”: A preliminary study of recurrent discursive realisations in a corpus of WhatsApp statuses. Procedia: Social and Behavioral Sciences, 212(C), 52–60.
Schwartz, L. (2018). The history and promise of machine translation. In I. Lacruz & R. Jääskeläinen (Eds.), Innovation and Expansion in Translation Process Research (pp. 168–198). John Benjamins.
Scott, M. (2012). WordSmith Tools (Version 6.0) [Computer software]. Lexical Analysis Software. [URL]
Stubbs, M. (1996). Text and Corpus Analysis: Computer-assisted Studies of Language and Culture. Blackwell.
Szymor, N. (2018). Translation: Universals or cognition? Target: International Journal of Translation Studies, 30(1), 53–86.
Tagliamonte, S., & Denis, D. (2008). Linguistic ruin? LOL! Instant messaging and teen language. American Speech, 83(1), 3–34.
Teich, E. (2003). Cross-Linguistic Variation in System and Text: A Methodology for the Investigation of Translations and Comparable Texts. Mouton de Gruyter.
Tirkkonen-Condit, S. (2002). Translationese: A myth or an empirical fact? A study into the linguistic identifiability of translated language. Target, 14(2), 207–220.
Toury, G. (2004). Probabilistic explanations in translation studies: Welcome as they are, would they qualify as universals? In A. Mauranen & P. Kujamäki (Eds), Translation Universals: Do they Exist? (pp. 15–32). John Benjamins.
(2012). Descriptive Translation Studies and Beyond (2nd ed.). John Benjamins.
Van Oost, A., Willems, A., & De Sutter, G. (2016). Asymmetric syntactic patterns in German-Dutch translation: A corpus-based study of the interaction between normalisation and shining through. International Journal of Translation, 101, 1–18.
Vanmassenhove, E., Shterionov, D., & Way, A. (2019). Lost in translation: Loss and decay of linguistic richness in machine translation. In M. Forcada, A. Way, B. Haddow, & Sennrich (Eds.), Proceedings of Machine Translation Summit XVII: Research Track (pp. 222–232). European Association for Machine Translation. [URL]
Wang, X., & Li, X. (2016). A corpus-based study of normalization in Chinese translations of Shakespeare’s plays. Journal of Foreign Languages, 39(3), 106–112.
Wang, B., Shan, D., Fan, A., Liu, L., & Guo, J. (2022). A sentiment classification method of web social media based on multidimensional and multilevel modelling. IEEE Transactions on Industrial Informatics, 18(2), 1240–1249.
Williams, D. (2005). Recurrent Features of Translation in Canada: A Corpus-based Study. University of Ottawa.
Zhang, M., & Toral, A. (2019). The effect of Translationese in machine translation test sets. In O. Bojar, R. Chatterjee, C. Federmann, M. Fishel, Y. Graham, B. Haddow, M. Huck, A. J. Yepes, P. Koehn, A. Martins, C. Monz, M. Negri, A. Névéol, M. Neves, M. Post, M. Turchi, & K. Verspoor (Eds.), Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers) (pp. 73–81). Association for Computational Linguistics. [URL].
Cited by (32)
Cited by 32 other publications
Hui, Ruby Ka Yee & Dechao Li
Li, Jia & Yuan Gao
Shen, Lin & Haidee Kotze
Tian, Yongli & Hung Thanh Bui
Zhou, Shuxia
Chen, Hua
Gao, Yuan, Guangxian Xu & Qifa Lin
Jiang, Yuqing
Jiang, Yuqing
Li, Jia & Xianyao Hu
Li, Jia & Xianyao Hu
2025. Is human translation more conservative than machine translation?. International Journal of Corpus Linguistics
Liu, Yiguang & Junying Liang
Ma, Yanrong
Niu, Jiang & Yue Jiang
Shi, Jing & Li Tao
Su, Hongrui
Wu, Kan, Victoria L.C. Lei & Defeng Li
Xing, Jiawei, Xiuli Rong & Guomin Chen
You, Shanshan
Yue, Shiliang
Zhang, Lihua
Zhang, Limin
Zhou, Shuxia, Reine Meylaerts, Erbing Hua & Linhua Zhang
2024. Trust to test translation practices. Translation and Translanguaging in Multilingual Contexts 10:2 ► pp. 117 ff.
Gao, Hongxia
2023. Analysis of English Machine Translation Methods Based on Intelligent Fuzzy Decision Tree Algorithm. In Proceedings of the 4th International Conference on Big Data Analytics for Cyber-Physical System in Smart City - Volume 2 [Lecture Notes on Data Engineering and Communications Technologies, 168], ► pp. 638 ff.
Lei, Suxia & You Li
Shi, Ni
Wang, Ruomu
Wu, Zhen & Guo Wang
Zamudio Padilla, Juan Diego & Liuqin Wang
Han, Yanlin, Shaoxiu Meng & Kapil Sharma
Jia, Juan, Muhammad Afzaal & Swaleha Bano Naqvi
This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
