In:Handbook of Terminology: Volume 3. Legal Terminology
Edited by Łucja Biel and Hendrik J. Kockaert
[Handbook of Terminology 3] 2023
► pp. 397–430
Get fulltext
EU phraseological verbal patterns in the PETIMOD 2.0 corpus
A NER-enhanced approach
Available under the Creative Commons Attribution-NoDerivatives (CC BY-ND) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 1 December 2023
https://doi.org/10.1075/hot.3.phr1
https://doi.org/10.1075/hot.3.phr1
Abstract
Texts from the European Union exhibit a high degree of formulaicity (Biel 2014). This chapter will study phraseological patterns in PETIMOD 2.0, an
English<>Spanish intermodal corpus of the EU Committee on Petitions. The first part briefly overviews the
corpus-based research on EU institutional phraseology, with a focus on contrastive approaches and parliamentary
corpora. The second part studies the formulaicity of named entities and their verbal patterns in PETIMOD 2.0. We
hypothesize that corpus-based Named Entity Recognition (NER) is the most suitable method to extract relevant
argument-structure constructions from such texts. Results shed light on the existence of different degrees of
formulaicity across languages and modes, but also on common features motivated by the pragmatics of the Petitions
Committee.
Article outline
- 1.Introduction
- 2.Related work
- 3.Study goals and methodology
- 3.1Units of analysis
- 3.2Choice of corpus
- 3.2.1Corpus size
- 3.2.2Transcription conventions and revisions
- 3.3Named entity recognition
- 3.3.1Extraction of entities and system performance
- 3.3.2Phraseological Pattern Extraction for NEs
- 4.Results and discussion
- 4.1Distribution of NEs
- 4.2Text-organizing patterns
- 4.3Grammatical patterns
- 4.4Term-embedding collocations
- 5.Conclusion
Notes References
References (46)
Aston, Guy. 2018. “Acquiring the Language of Interpreters: A Corpus-based Approach.” In Making Way in Corpus-based Interpreting Studies, edited by Mariachiara Russo, Claudio Bendazzoli & Bart Defrancq, 83–96. Singapore: Springer.
Bernardini, Silvia, Adriano Ferraresi, Mariachiara Russo, Camille Collard, and Bart Defrancq. 2018. “Building Interpreting and Intermodal Corpora: A How-to for a Formidable Task.” In Making Way in Corpus-based Interpreting Studies, edited by Mariachiara Russo, Claudio Bendazzoli & Bart Defrancq, 21–42. Singapore: Springer.
Biel, Łucja. 2014. “Phraseology in legal translation: A corpus-based analysis of textual mapping in EU
law.” In The Ashgate Handbook of Legal Translation, edited by Le Cheng, King Kui Sin & Anne Wagner, 177–192.
. 2018. “Lexical bundles in EU law: The impact of translation process on the patterning of legal
language.” In Phraseology in Legal and Institutional Settings: A Corpus-Based Interdisciplinary Perspective, edited by Stanisław Goźdź-Roszkowski and Gianluca Pontrandolfo, 11–26. London: Routledge.
. 2021. “Eurolects and EU Legal Translation.” In The Oxford Handbook of Translation and Social Practices, edited by Meng Ji and Sara Laviosa, 477–500. Online: Oxford University Press.
Biel, Łucja, Agnieszka Biernacka, and Anna Jopek-Bosiacka. 2018. “Collocations of Terms in EU Competition Law: A Corpus Analysis of EU English
Collocations.” In Language and Law: The Role of Language and Translation in EU Competition Law, edited by Silvia Marino, Łucja Biel, Martina Bajčić and Vilelmini Sosoni, 249–274. Cham: Springer International Publishing.
Biel, Łucja and Agnieszka Doczekalska. 2020. “How do supranational terms transfer into national legal systems?” Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 26(2):184–212.
Biel, Łucja, Dariusz Koźbiał, and Katarzyna Wasilewska. 2019. “The formulaicity of translations across EU institutional genres: A corpus-driven analysis of
lexical bundles in translated and non-translated language.” Translation Spaces 8(1):67–92.
Biel, Łucja and Izabela Pytel. 2021. “Corrigenda of EU Legislative Acts as an Indicator of Quality Assurance Failures.” In Institutional Translation and Interpreting, edited by Fernando Prieto Ramos, 150–173. New York: Routledge.
Blini, Lorenzo. 2018. “Observing Eurolects: The case of Spanish.” In Observing Eurolects: Corpus analysis of linguistic variation in EU law, edited by Laura Mori, 329–367.
Burtsev, Mikhail, Alexander Seliverstov, Rafael Airapetyan, Mikhail Arkhipov, Dilyara Baymurzina, Nickolay Bushkov, and Marat Zaynutdinov. 2018. “Deeppavlov: Open-source library for dialogue systems.” In Proceedings of ACL 2018, System Demonstrations, 122–127. Melbourne: Association for Computational Linguistics.
Corpas Pastor, Gloria. 2017. “Collocations in E-Bilingual Dictionaries: From Underlying Theoretical Assumptions to Practical
Lexicography and Translation Issues.” In Collocations and Other Lexical Combinations in Spanish. Theoretical and Applied Approaches, edited by Sergi Torner Castells and Elisenda Bernal, 139–160. London: Routledge.
. 2021. “Technology Solutions for Interpreters: The VIP System.” Hermēneus. Revista de Traducción e Interpretación 23:91–123.
Corpas Pastor, Gloria & Fernando Sánchez Rodas. 2022. NLP-enhanced Shift Analysis of Named Entities in an English Spanish Intermodal Corpus of European
Petitions. In Marta Kajzer-Wietrzny, Adriano Ferraresi, Ilmari Ivaska & Silvia Bernardini (eds.), Mediated discourse at the European Parliament: Empirical investigations, 219–251. Berlin: Language Science Press. [URL]
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. “Bert: Pre-training of deep bidirectional transformers for language understanding.” [URL]
Dobrić Basaneže, Katja. 2017. “Interpreting Phraseological Units in Contracts: The Case of Extended Term–Embedding
Collocation.” Suvremena Lingvistika 43(84):199–216.
European Parliament. 2018. Guidelines: Committee on Petitions. [URL]
European Union. 2012a. Interinstitutional Style Guide 2011. Luxembourg: Publications Office of the European Union.
. 2012b. Libro de estilo interinstitucional 2011. Luxembourg: Publications Office of the European Union.
Ferraresi, Adriano, Silvia Bernardini, Marie-Aude Lefer, and Maja Miličević. 2017. “Investigating the language of written translation and simultaneous interpretation: Simplification
in EPTIC.” In Congrès Mondial de Traductologie (Université de Paris-Nanterre, du 10/04/2017 au 14/04/2017). [URL]
Ferraresi, Adriano and Maja Miličević. 2017. “Phraseological patterns in interpreting and translation : similar or different ?” In Empirical Translation Studies: New Methodological and Theoretical Traditions, edited by Gert De Sutter, Marie-Aude Lefer and Isabelle Delaere, 157–182. Berlin: De Gruyter Mouton.
Goźdź-Roszkowski, Stanisław. 2011. Patterns of Linguistic Variation in American Legal English: A Corpus-Based Study. Frankfurt am Main: Peter Lang.
. 2012. “Discovering Patterns and Meanings: Corpus Perspectives on Phraseology in Legal
Discourse.” Roczniki Humanistyczne 60(8):47–70. [URL]
Goźdź-Roszkowski, Stanisław and Gianluca Pontrandolfo, eds. 2018. Phraseology in legal and institutional settings: A corpus-based interdisciplinary perspective. London: Routledge.
. 2015a. “Legal Phraseology Today: Corpus-based Applications Across Legal Languages and
Genres.” Fachsprache 37(3–4):130–138. 10.24989/fs.v37i3-4.1287.
Henriksen, Line. 2007. “The song in the booth: Formulaic interpreting and oral textualisation.” Interpreting 9(1):1–20.
Hrežo, Vladimir. 2020. “Exploring Phraseology in EU Legal Discourse.” Language – Culture – Politics 1:29–52.
Jacquet, Guillaume, Maud Ehrmann, Jakub Piskorski, Hristo Tanev, and Ralf Steinberger. 2019. “Cross-lingual linking of multi-word entities and language-dependent learning of multi-word entity
patterns.” In Representation and Parsing of Multiword Expressions: Current trends, edited by Yannick Parmentier and Jakub Waszczuk, 269–297. Berlin: Language Science Press.
Kajzer-Wietrzny, Marta and Łukasz Grabowski. 2021. “Formulaicity in Constrained Communication: An Intermodal Approach.” MonTI. Monografías de Traducción e Interpretación 13:148–83.
Klabal, Ondřej. 2019. “Corpora in Legal Translation: Overcoming Terminological and Phraseological Assymetries between
Czech and English.” CLINA: Revista Interdisciplinaria de Traducción, Interpretación y Comunicación Intercultural 5(2):165–86.
Nasar, Zara, Syed Waqar Jaffry, and Muhammad Kamran Malik. 2021. “Named Entity Recognition and Relation Extraction: State-of-the-Art.” ACM Computing Surveys 54(1), 1–39.
Nouvel, Damien, Maud Ehrmann, and Sophie Rosset, eds. 2016. Named Entities for Computational Linguistics. Hoboken, NJ: John Wiley & Sons, Inc.
Pontrandolfo, Gianluca. 2011. “Phraseology in criminal judgments: A corpus study of original vs. translated
Italian.” Sendebar 22:209–234.
. 2015. “Investigating Judicial Phraseology with COSPE: A contrastive Corpus-based Study.” In New directions in corpus-based translation studies, edited by Claudio Fantinuoli and Federico Zanettin, 137–159. Berlin: Language Science Press.
. 2021. “National and EU judicial phraseology under the magnifying glass: a corpus-assisted analysis of
complex prepositions in Spanish.” Perspectives 29(2). 260–277.
Prieto Ramos, Fernando. 2021. “Translating legal terminology and phraseology: between inter-systemic incongruity and multilingual
harmonization.” Perspectives 29(2):175–183.
Sandrelli, Annalisa. 2018. “Observing Eurolects: The case of English.” In Observing Eurolects: Corpus analysis of linguistic variation in EU law, edited by Laura Mori, 63–92.
Santandrea, Manuela. 2014. Le collocazioni in traduzione e interpretazione tra italiano e inglese: uno studio su
EPTIC_01_2011. Università di Bologna. [URL]
Seracini, Francesca L. 2020. “Phraseology in multilingual EU legislation: a corpus-based study of translated multi-word
terms.” Perspectives 29:245–259.
Steinberger, Josef, Polina Lenkova, Mijail Kabadjov, Ralf Steinberger, and Erik Van Der Goot. 2011. “Multilingual entity-centered sentiment analysis evaluated by parallel corpora.” In International Conference Recent Advances in Natural Language Processing, RANLP, edited by Galia Angelova, Kalina Bontcheva, Ruslan Mitkov & Nikolai Nikolov, 770–775. Hissar: Association for Computational Linguistics.
Teich, Elke. 2003. Cross-Linguistic Variation in System and Text: A Methodology for the Investigation of Translations and
Comparable Texts. Berlin: Mouton de Gruyter.
Trklja, Aleksandar. 2018. “A corpus investigation of formulaicity and hybridity in legal language: A case of EU case law
texts.” In Phraseology in Legal and Institutional Settings: A Corpus-Based Interdisciplinary Perspective, edited by Stanisław Goźdź-Roszkowski and Gianluca Pontrandolfo, 89–108. London: Routledge.
Vigier-Moreno, Francisco Javier and María del Mar Sánchez Ramos. 2017. “Using parallel corpora to study the translation of legal system-bound terms: The case of names of
English and Spanish courts.” In Computational and Corpus-Based Phraseology. Second International Conference, Europhras 2017 London, UK,
November 13–14, 2017 Proceedings, edited by Ruslan Mitkov, 260–273. Cham: Springer.
