Looking under the hood for evidence of normalization: Multivariate exploratory analysis of lexical bundles

Lee, Changsoo

doi:10.54754/incontext.v2i1.11

Article published In: InContext
Vol. 2:1 (2022) ► pp.57–83

Get fulltext from our e-platform

Download PDF

Looking under the hood for evidence of normalization

Multivariate exploratory analysis of lexical bundles

Changsoo Lee | Hankuk University of Foreign StudiesSeoul, Korea

Available under the Creative Commons Attribution-NonCommercial (CC BY-NC) 4.0 license.

For any use beyond this license, please contact the publisher at rights@benjamins.nl.

Published online: 28 April 2022

https://doi.org/10.54754/incontext.v2i1.11

Abstract

The study investigated the hypothesis of normalization and stylistic variation across translators as manifested in the use of lexical bundles between translated and non-translated English literary texts. Normalization is a hypothesis originally proposed as ‘conservatism’ by Baker (1996) which states that the translator tends to conform to linguistic patterns and conventions typical of the target language even to the point of exaggeration, and lexical bundles are sequences of three or four words recurring with high frequency in natural discourse. The study was carried out in two stages. The first stage replicated previous studies that relied on simple frequency tests to confirm the normalization hypothesis. Contrary to these earlier studies, the present study’s frequency tests on lexical bundles failed to provide clear support for the normalization hypothesis. The second stage employed two types of multivariate exploratory analysis, principal component analysis (PCA) and hierarchical cluster analysis (HCA), to examine the underlying relationships among individual texts, lexical bundles, and translated and non-translated group categories. Following the failed frequency tests, it was hypothesized here that normalization might be still present in the translated corpus but restricted by types of lexical bundles. PCA confirmed this hypothesis by revealing that normalization occurred in the use of a particular functional type of lexical bundles, called discourse bundles, which are relatively free from the thematic content of the text in which they occur. This ascertains the traditional idea that statistical tests of translation hypotheses must deal with linguistic features unrelated to the thematic content of the corpus. Additionally, PCA revealed variation across the types of lexical bundles preferred by individual translators. HCA further identified the presence of a subgroup of translated texts that cluster with non-translated texts, rather than with their fellow translated texts. This was taken as indicating that the use of lexical bundles varied among the translators and that the division between translated and non-translated texts is not clear-cut.

Keywords: corpus-based translation studies, normalization, principal component analysis, hierarchical cluster analysis, Korean-English literary translation

논문초록

본 연구는 번역과 비번역 영어 문학 텍스트 간에 어휘 번들(lexical bundle) 의 사용 패턴에 기초하여 번역 표준화 가설과 번역가 별 문체 차이를 연구하는데 목적 이 있다. 표준화란 베이커(1996)가 초기에 보수성이라는 명칭으로 제한한 가설로 번역 가는 과장될 정도로 도착어의 언어 패턴이나 규범을 따르는 경향이 있다는 주장이다. 어휘 번들은 자연 담화에서 고빈도로 발생하는 어휘 패턴으로 몇 개의 단어가 연쇄적 으로 연결된 단위를 일컫는다. 본 연구는 2단계로 진행되었는데, 1단계에서는 빈도 분 석에 기초하여 표준화 가설을 입증한 이전 연구를 재현하여 연구결과를 검증하였다. 이전 연구와 달리 본 연구의 빈도 분석에서는 표준화를 입증하는 결과가 도출되지 않 았다. 두번째 단계에서는 다변수 탐구적 통계분석법인 주성분분석(PCA)과 계층적 클 러스터분석(HCA)를 사용하여 개별 텍스트, 어휘 번들, 번역-비번역 집단 범주 간의 기저 관계를 분석하였다. 표준화 입증에 실패한 빈도 분석에 이어 2단계에서는 번역코 퍼스에 여전히 표준화가 존재하지만 특정 번들 형태에 제약을 받는다는 가설을 세워 검증했다. PCA분석에서는 텍스트의 주제 내용에서 비교적 자유로운 담화 번들이라는 특정한 종류의 어휘 번들에서만 표준화가 목격되어 동 가설을 뒷받침하는 결과가 도출 되었다. 이는 번역보편소에 대한 통계 분석은 코퍼스의 주제 내용의 영향을 받지 않는 언어 자질을 사용해야 한다는 기존 주장을 뒷받침한다. PCA에서는 번역자 간에 선호 되는 어휘 번들의 종류에서도 차이가 나타났다. HCA 분석에서는 번역 텍스트 중 일부 가 다른 번역 텍스트와 거리를 두고 비번역 텍스트와 군집을 형성하는 것이 추가로 확 인되었다. 이는 번역자 간에도 어휘 번들 사용 양상에서 차이가 존재하며 번역과 비번 역 텍스트 간의 경계가 명확하지 않다는 것을 보여주는 결과로 해석된다.

핵심어: 코퍼스기반 번역연구, 표준화, 주성분분석, 계층적 클러스터분석, 한영문학 번역

References (32)

References

Baayen, R. Harald. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics Using R. Cambridge University Press.

Baker, Mona. (2004). A corpus-based view of similarity and difference in translation. International Journal of Corpus Linguistics, 9(2), 167–193.

. (1996). Corpus-based translation studies: The challenges that lie ahead. In Harold Somers (Ed.), Terminology, LSP, and Translation: Studies in Language Engineering in Honour of Juan C. Sager (pp. 175–186). John Benjamins.

Biber, Douglas and Federica Barbieri. (2007). Lexical bundles in university spoken and written registers. English for Specific Purposes, 26(3), 263–286.

Biber, Douglas and Susan Conrad. (1999). Lexical bundles in conversation and academic prose. In Hilde Hasselgård & Signe Oksefjell (Eds.), Out of Corpora: Studies in Honour of Stig Johansson (pp. 181–189). Rodopi.

Biber, Douglas, Susan Conrad and Viviana Cortes. (2004). If you look at …: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405.

Biber, Douglas, Stig Johansson, Geoffrey Leech, Susan Conrad and Edward Finegan. (1999). Longman Grammar of Spoken and Written English. Pearson Educated Ltd.

Burrows, John. (2002). ‘Delta’: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 17(3), 267–287.

. (1987). Word-patterns and story-shapes: The statistical analysis of narrative style. Literary and Linguistic Computing, 2(2), 61–70.

De Sutter, Gert, Isabelle Delaere and Koen Plevoets. (2012). Lexical lectometry in corpus-based translation studies: Combining profile-based correspondence analysis and logistic regression modeling. In Michael P. Oakes & Meng Ji (Eds.), Quantitative Methods in Corpus-based Translation Studies: A Practical Guide to Descriptive Translation Research (pp. 325–346). John Benjamins.

Everitt, Brian and Torsten Hothorn. (2011). An Introduction to Applied Multivariate Analysis with R. Springer.

Forsyth, Richard S. and Phoenix W. Y. Lam. (2014). Found in translation: To what extent is authorial discriminability preserved by translators? Literary and Linguistic Computing, 29(2), 199–217.

Grabowski, Łukasz. (2013). Interfacing corpus linguistics and computational stylistics: Translation universals in translational literary Polish. International Journal of Corpus Linguistics, 18(2), 254–280.

Husson, François, Sébastien Lé and Jérôme Pagès. (2011). Exploratory Multivariate Analysis by Example Using R. CRC Press.

Hyland, Ken. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27(1), 4–21.

Jenset, Gard B. (2008). Basic R for corpus linguistics [Seminar handout]. University of Bergen. Retrieved February 12, 2014 from [URL]

Jenset, Gard B. and Barbara McGillivray. (2012). Multivariate analyses of affix productivity in translated English. In Michael P. Oakes & Meng Ji (Eds.), Quantitative Methods in Corpus-based Translation Studies: A Practical Guide to Descriptive Translation Research (pp. 301–324). John Benjamins.

Kenny, Dorothy. (2001). Lexis and Creativity in Translation: A Corpus-based Study. St. Jerome Publishing.

. (2000). Translators at play: Exploitations of collocational norms in German English. In Dodd Bill (Ed.), Working with German Corpora: With a Foreword by John Sinclair (pp. 143–160). University of Birmingham Press.

. (1999). The German-English parallel corpus of literary texts (GEPCOLT): A resource for translation scholars. Teanga, 1(18), 25–42.

. (1998). Creatures of habit? What translators usually do with words. Meta, 43(4), 515–523.

Lee, Changsoo. (2021). How do machine translators measure up to human literary translators in stylometric tests? Digital Scholarship in the Humanities, 1–17.

. (2013). Using lexical bundle analysis as discovery tool for corpus-based translation research. Perspectives, 21(3), 378–395.

Malmkjaer, Kirsten. (1998). Love thy neighbour: Will parallel corpora endear linguists to translators? Meta, 43(4), 534–541.

Raykov, Tenko and George A. Marcoulides. (2008). An Introduction to Applied Multivariate Analysis. Routledge.

Rybicki, Jan. (2006). Burrowing into translation: Character idiolects in Henryk Sienkiewicz’s trilogy and its two English translations. Literary and Linguistic Computing, 21(1), 91–103.

Rybicki, Jan and Magda Heydel. (2013). The stylistics and stylometry of collaborative translation: Woolf’s Night and Day in Polish. Literary and Linguistic Computing, 28(4), 708–717.

Scott, Mike and Christopher Tribble. (2006). Textual Patterns: Key Words and Corpus Analysis in Language Education. John Benjamins.

Toury, Gideon. (1995). Descriptive Translation Studies and Beyond. John Benjamins.

. (1980). In Search of a Theory of Translation. Porter Institute for Poetics and Semiotics, Tel Aviv University.

Venuti, Lawrence. (1995). The Translator’s Invisibility. Routledge.

Xiao, Richard. (2010). Idioms, word clusters, and reformulation markers in translational Chinese: Can “translation universals” survive in Mandarin? In Xiao Richard (Ed.), Proceedings of the International Symposium on Using Corpora in Contrastive and Translation Studies (pp. 1–40). [URL]