Cover not available

In:Multilingual Corpus Research: Advances and challenges
Edited by Noelia Ramón and María Pérez Blanco
[Studies in Corpus Linguistics 126] 2026
► pp. 296314

References (62)
References
Artetxe, M. & Schwenk, H. (2019a). Massively multilingual sentence embeddings for zero-shot cross-lingual transfer and beyond. arXiv preprint. arXiv: 1812.10464. Google Scholar logo with link to Google Scholar
Artetxe, M., & Schwenk, H. (2019b). Margin-based parallel corpus mining with multilingual sentence embeddings. In A. Korhonen, D. Traum, & L. Màrquez (Eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 3197–3203). Association for Computational Linguistics. Retrieved on 30 December 2020 from [URL]. Google Scholar logo with link to Google Scholar
Bernardini, S., Ferraresi, A., Garcea, F., & Rodriguez Blanco, N. (2023). Corpus approaches to news translation: Can we do better than comparable? In M. Kajzer-Wietrzny & A. Chmiel (Eds.), UCCTS 2023— Book of abstracts (pp. 21–24). Retrieved on 28 October 2023 from [URL]Google Scholar logo with link to Google Scholar
Bernardini, S., & Ferreresi, A. (2024). Corpus approaches to news translation: We can do better than comparable! Across Languages and Cultures, 25(2), 198–215. Google Scholar logo with link to Google Scholar
Biel, Ł. (2017). Enhancing the communicative dimension of legal translation: Comparable corpora in the research-informed classroom. The Interpreter and Translator Trainer, 11(4), 316–336. Google Scholar logo with link to Google Scholar
Biel, Ł., & Koźbiał, D. (2020). How do translators handle (near-)synonymous legal terms? A mixed-genre parallel corpus study into the variation of EU English-Polish competition law terminology. Estudios de Traducción, 10, 69–90. Google Scholar logo with link to Google Scholar
Čermák, F., & Rosen, A. (2012). The case of InterCorp, a multilingual parallel corpus. International Journal of Corpus Linguistics, 17(3), 411–427. Google Scholar logo with link to Google Scholar
Chen, P. (2024). The impact of generative AI on the role of translators and its implications for translation education. Education Insights, 1(2), 24–33. Google Scholar logo with link to Google Scholar
Church, K. (2025). Comparable corpora: Opportunities for new research directions. arXiv preprint. arXiv: 2501.14721v1Google Scholar logo with link to Google Scholar
Cowie, A. (1998). Phraseology: Theory, analysis and applications. Clarendon Press. Google Scholar logo with link to Google Scholar
Fantinuoli, C. (2023). Towards AI-enhanced computer-assisted interpreting. In G. Corpas Pastor & B. Defrancq (Eds.), Interpreting technologies — Current and future trends (pp. 46–71). John Benjamins. Google Scholar logo with link to Google Scholar
Feng, F., Yang, Y., Cer, D., Arivazhagan, N., & Wang, W. (2022). Language-agnostic BERT sentence embedding. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Vol. 1: Long papers (pp. 878–891). Association for Computational Linguistics. Retrieved on 20 October 2023 from [URL]. Google Scholar logo with link to Google Scholar
Frank, E., Hall, M., Witten, I. & Pal, C. (2016). The WEKA Workbench. Online appendix for Data mining: Practical machine learning tools and techniques (4th ed.). Morgan Kaufmann. Retrieved on 5 February 2025 from [URL]Google Scholar logo with link to Google Scholar
Fu, L., & Liu, L. (2024). What are the differences? A comparative study of generative artificial intelligence translation and human translation of scientific texts. Humanities and Social Sciences Communication, 11, 1236. Google Scholar logo with link to Google Scholar
Galtung, J., & Ruge, M. H. (1965). The structure of foreign news: The presentation of the Congo, Cuba and Cyprus crises in four Norwegian newspapers. Journal of Peace Research, 2(1), 64–90. Google Scholar logo with link to Google Scholar
Gete, H., & Etchegoyhen, T. (2022). Making the most of comparable corpora in neural machine translation: A case study. Language Resources & Evaluation, 56, 943–971 Google Scholar logo with link to Google Scholar
Grabowski, Ł. 2018. On identification of bilingual lexical bundles for translation purposes. The case of an English-Polish comparable corpus of patient information leaflets. In R. Mitkov, J. Monti, G. Corpas Pastor, & V. Seretan (Eds.), Multiword units in machine translation and translation technology (pp. 182–199). John Benjamins. Google Scholar logo with link to Google Scholar
(2022). Provoke or encourage improvements? On semantic prosody in English-to-Polish translation. Perspectives: Studies in Translation Theory and Practice, 30, 120–136. Google Scholar logo with link to Google Scholar
Granger, S., & Lefer, M.-A. (2022). Corpus-based translation and interpreting studies: A forward-looking review. In S. Granger & M.-A. Lefer (Eds.), Extending the scope of corpus-based translation studies (pp. 13–41). Bloomsbury. Google Scholar logo with link to Google Scholar
Guo, M., Shen, Q., Yang, Y., Ge, H., Cer, D., Abrego, G., Stevens, K., Constant, N., Sung, Y.-H., Strope, B., & Kurzwell, R. (2018). Effective parallel corpus mining using bilingual sentence embeddings. In O. Bojar, R. Chatterjee, C. Federmann, M. Fishel, Y. Graham et al., Proceedings of the Third Conference on Machine Translation (WMT), Volume 1: Research papers (pp. 165–176). Association for Computational Linguistics. Google Scholar logo with link to Google Scholar
Hareide, L. (2019). Comparable parallel corpora. A critical review of current practices in corpus-based translation studies. In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 19–38). John Benjamins. Google Scholar logo with link to Google Scholar
Hjarvard, S. (2024). The globalization of language. How the media contribute to the spread of English and the emergence of medialects. Nordicom Review, 25(1), 75–97. Google Scholar logo with link to Google Scholar
Hothorn T., Hornik K., & Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics, 15(3), 651–674. Google Scholar logo with link to Google Scholar
Hothorn T., & Zeileis, A. (2015). partykit: A modular toolkit for recursive partytioning in R. Journal of Machine Learning Research, 16, 3905–3909.Google Scholar logo with link to Google Scholar
Hothorn, T., Deibold, H., & Zeileis, A. (2024). Package ‘partykit’: A toolkit for recursive partyitioning. Retrieved on 23 November 2020 from [URL]
Jantunen, J. (2002). Comparable corpora in translation studies: Strengths and limitations. SKY Journal of Linguistics, 5, 105–117.Google Scholar logo with link to Google Scholar
Johnson, J., Douze, M., & Jégou, H. (2019). Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3), 535–547. Google Scholar logo with link to Google Scholar
Kajzer-Wietrzny, M., Ivaska, I., & Ferraresi, A. (2021). ‘Lost’ in interpreting and ‘found’ in translation: Using an intermodal, multidirectional parallel corpus to investigate the rendition of numbers. Perspectives: Studies in Translation Theory and Practice, 29(4), 469–488. Google Scholar logo with link to Google Scholar
Klimaszewski, M., & Wróblewska, A. (2021). COMBO: State-of-the-art morphosyntactic analysis. arXiv preprint, arXiv:2109.05361. Google Scholar logo with link to Google Scholar
Kruk, M., & Kałużna, A. (2024). Investigating the role of AI tools in enhancing translation skills, emotional experiences, and motivation in L2 learning. European Journal of Education, e12859. Google Scholar logo with link to Google Scholar
Lapshinova-Koltunski, E. (2022). Detecting normalisation and shining-through in novice and professional. In S. Granger & M.-A. Lefer (Eds.), Extending the scope of corpus-based translation studies (pp. 182–206). Bloomsbury. Google Scholar logo with link to Google Scholar
Lai, G., Dai, Z., & Yang, Y. (2020). Unsupervised parallel corpus mining on web data. arXiv preprint. arXiv: 2009.08595Google Scholar logo with link to Google Scholar
Lefer, M.-A. (2020). Parallel corpora. In M. Paquot & S. T. Gries (Eds.), Practical handbook of corpus linguistics (pp. 257–282). Springer. Google Scholar logo with link to Google Scholar
Lewandowska-Tomaszczyk, B., & Pęzik, P. (2018). Parallel and comparable language corpora, cluster equivalence and translator education. In Society and languages in the third millennium — Communication. Education. Translation (pp. 131–142). RUDN University.Google Scholar logo with link to Google Scholar
López Arroyo, B., & Roberts, R. (2017). Genre and register in comparable corpora: An English/Spanish contrastive analysis. Meta, 62(1), 114–136. Google Scholar logo with link to Google Scholar
Macanovic, A., & Przepiorka, W. (2024). Mapping individuals’ internal states from online posts. Behavior Research Methods, 56, 2782–2803. Google Scholar logo with link to Google Scholar
Marco, J. (2019). Living with parallel corpora. In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 39–56). John Benjamins. Google Scholar logo with link to Google Scholar
Mastropierro, L. (2020). The translation of reporting verbs in Italian: The case of the Harry Potter series. International Journal of Corpus Linguistics, 25(3), 241–269. Google Scholar logo with link to Google Scholar
Mastropierro, L., & Grabowski, Ł. (2024). Repeated reporting verbs in English novels and their Italian and Polish translations: A preliminary multifactorial study. Across Languages and Cultures, 25(2), 310–330. Google Scholar logo with link to Google Scholar
Mikhailov, M., & Cooper, R. (2016). Corpus linguistics for translation and contrastive studies. A guide for research. Routledge. Google Scholar logo with link to Google Scholar
Philip, G. (2009). Arriving at equivalence: Making a case for comparable general reference corpora in translation studies. In A. Beeby, P. Rodríguez-Inés, & P. Sánchez-Gijón (Eds.), Corpus use and translating: Corpus use for learning to translate and learning corpus use to translate (pp. 59–73). John Benjamins. Google Scholar logo with link to Google Scholar
Pęzik, P. (2014). Graph-based analysis of collocational profiles. In V. Jesenšek & P. Grzybek (Eds.), Phraseologie im Wörterbuch und Korpus (pp. 227–243). Filozofska fakulteta.Google Scholar logo with link to Google Scholar
(2016). Exploring phraseological equivalence with Paralela. In E. Gruszczyńska & A. Leńko-Szymańska (Eds.), Polish-language parallel corpora (pp. 67–81). Instytut Lingwistyki Stosowanej UW.Google Scholar logo with link to Google Scholar
(2018). Facets of prefabrication. Perspectives on modelling and detecting phraseological units. Wydawnictwo Uniwersytetu Łódzkiego.Google Scholar logo with link to Google Scholar
(2020). Budowa i zastosowania korpusu monitorującego MoncoPL. Forum Lingwistyczne, 7, 133–150. Google Scholar logo with link to Google Scholar
(2021). Exploring the valency of collocational chains. In A. Trklja & Ł. Grabowski (Eds.), Formulaic language: Theories and methods (pp. 53–78). Google Scholar logo with link to Google Scholar
Quinlan, R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann.Google Scholar logo with link to Google Scholar
Rabadán, R., & Izquierdo, M. (2013). A corpus-based analysis of English affixal negation translated into Spanish. In K. Aijmer & B. Altenberg (Eds.), Advances in corpus-based contrastive linguistics: Studies in honour of Stig Johansson (pp. 57–82). John Benjamins. Google Scholar logo with link to Google Scholar
Ramón, N. (2023). Exploring near-synonyms through translation corpora. A case study on ‘begin’ and ‘start’ in the English-Spanish parallel corpus PACTRES. In M. Izquierdo & Z. Sanz-Villar (Eds.), Corpus use in cross-linguistic research: Paving the way for teaching, translation and professional communication (pp. 91–107). John Benjamins. Google Scholar logo with link to Google Scholar
Sanjurjo-González, H., & Izquierdo, M. (2019). P-ACTRES 2.0: A parallel corpus for cross-linguistic research. In I. Doval & M. T. Sánchez Nieto (Eds.), Parallel corpora for contrastive and translation studies: New resources and applications (pp. 215–231). John Benjamins. Google Scholar logo with link to Google Scholar
Schwenk, H. & Douze, M. (2017). Learning joint multilingual sentence representations with neural machine translation. arXiv preprint. arXiv:1704.04154. Google Scholar logo with link to Google Scholar
Sharoff, S., Rapp, R., & Zweigenbaum, P. (2023a). Building comparable corpora. In Building and using comparable corpora for multilingual natural language processing (pp. 17–37). Springer. Google Scholar logo with link to Google Scholar
(2023b). Other applications of comparable orpora. In Building and using comparable corpora for multilingual natural language processing (pp. 117–128). Springer. Google Scholar logo with link to Google Scholar
Tannenbaum, P. H. (1953). The effect of headlines on the interpretation of news stories. Journalism Quarterly, 30(2), 189–197. Google Scholar logo with link to Google Scholar
Teich, E. (2003). Cross-linguistic variation in system and text: A methodology for the investigation of translations and comparable texts. Mouton de Gruyter. Google Scholar logo with link to Google Scholar
Tiedemann, J. (2011). News from OPUS— A collection of multilingual parallel corpora with tools and interfaces. In N. Nicolov, K. Bontcheva, G. Angelova, & R. Mitkov (Eds.), Recent advances in natural language processing (Vol. V, pp. 237–248). John Benjamins. Google Scholar logo with link to Google Scholar
Xia, G. (2020). A comparable-corpus-based study of informal features in academic writing by English and Chinese scholars across disciplines. Ibérica, 39, 119–140. Google Scholar logo with link to Google Scholar
Yuxiu, Y. (2024). Application of translation technology based on AI in translation teaching. Systems and Soft Computing, 6, 200072. Google Scholar logo with link to Google Scholar
Zanettin, F. (2014). Corpora in translation. In J. House (Ed.), Translation: A multidisciplinary approach (pp. 178–199). Palgrave Macmillan. Google Scholar logo with link to Google Scholar
Mobile Menu Logo with link to supplementary files background Layer 1 prag Twitter_Logo_Blue