Article published In: Approaches to Machine Translation
Edited by Mahdieh Fakhar, Monica Vilhelm and Paz Díez-Arcón
[Translation and Translanguaging in Multilingual Contexts 11:1] 2025
► pp. 5–30
Powerful variables for knowledge representation and bracketing prediction
Published online: 7 January 2025
https://doi.org/10.1075/ttmc.00151.roj
https://doi.org/10.1075/ttmc.00151.roj
Abstract
The acquisition of knowledge is essential for specialized
translation, and the representation of specialized phraseology in terminological
knowledge bases facilitates this process. The aim of this study is two-fold.
Firstly, it describes how the semantic annotation of the predicate-argument
structure of sentences mentioning named rivers can be addressed from the
perspective of Frame-based Terminology. The results show that this approach,
including the semantic variables of verb lexical domain, semantic role, and
semantic category, provides valuable insights into the knowledge structures
underlying the usage of named rivers in specialized texts. Secondly, this study
explores whether the bracketing of a three-component multiword term can be
predicted from the semantic information encoded in the sentence where the
ternary compound and a named river are used as arguments. The semantic variables
of lexical domain, semantic role, and semantic category allowed us to construct
two machine-learning models capable of accurately predicting ternary-compound
bracketing.
Article outline
- 1.Introduction
- 2.Frame-based Terminology
- 3.Materials and methods
- 3.1Corpus data
- 3.2GeoNames geographic database
- 3.3Recognition of named rivers
- 3.4From multiword-term level to phrase level: Semantic annotation of
predicate-argument structures for named rivers
- 3.4.1Predicate classification in lexical domains
- 3.4.2Semantic roles
- 3.4.3Semantic categories
- 3.4.4Semantic relations
- 3.4.5Inter-annotator agreement
- 4.Results of the semantic annotations
- 4.1Lexical domain of action
- 4.2Construction of frames evoked by named rivers
- 5.Prediction of the bracketing of three-component multiword terms
- 5.1Bracketing of multiword terms
- 5.2Methods for bracketing prediction in the literature
- 5.3Semantic approach to the prediction of ternary-compound bracketing
- 5.3.1Description of the sample of ternary compounds
- 5.3.2Supervised models
- 5.3.3Data splitting
- 5.3.4Model performance measures
- 5.3.5Construction of the supervised models
- 5.4Comparison of the results with previous research
- 6.Conclusions
- Notes
References
References (32)
Barrière, Caroline, and Pierre A. Ménard. 2014. “Multiword
Noun Compound Bracketing Using
Wikipedia.” In Proceedings
of the First Workshop on Computational Approaches to Compound
Analysis, 72–80. Dublin: ACL. [URL].
Bergsma, Shane, Emily Pitler, and Dekang Lin. 2010. “Creating
Robust Supervised Classifiers via Web-scale N-gram
Data.” In Proceedings
of the 48th Annual Meeting of the
ACL, 865–874. Uppsala, Sweden: ACL. [URL]
Boas, Hans C. 2005. “Semantic
Frames as Interlingual Representations for Multilingual Lexical
Databases.” International Journal of
Lexicography 18 (4): 445–478.
Buendía-Castro, Míriam, and Pamela Faber. 2016. “Phraseological
Correspondence in English and Spanish Specialized
Texts.” In Computerised
and Corpus-based Approaches to Phraseology: Monolingual and Multilingual
Perspectives, ed. by Gloria Corpas, 391–398. Geneva: Tradulex. [URL]
Faber, Pamela. 2009. “The
Cognitive Shift in Terminology and Specialized
Translation.” MonTI.
Monografías de Traducción e Interpretación [Monographs on Translation and
Interpreting] 11: 107–134.
, ed. 2012. A
Cognitive Linguistics View of Terminology and Specialized
Language. Berlin: De Gruyter Mouton.
Faber, Pamela, and Melania Cabezas-García. 2019. “Specialized
Knowledge Representation: From Terms to
Frames.” Research in
Language 17 (2): 197–211.
Faber, Pamela, and Ricardo Mairal. 1999. Constructing
a Lexicon of English
Verbs. Berlin: Mouton de Gruyter.
Faber, Pamela, Pilar León-Araúz, and Juan A. Prieto. 2009. “Semantic
Relations, Dynamicity, and Terminological Knowledge
Bases.” Current Issues in Language
Studies 11: 1–23. [URL]
Faruqui, Manaal, and Chris Dyer. 2015. “Non-distributional
Word Vector
Representations.” In Proceedings
of the 53rd Annual Meeting of the
ACL, 464–469. Beijing: ACL. [URL].
Fillmore, Charles J. 1968. “The
Case for
Case.” In Universals
in Linguistic Theory, ed.
by Emmon Bach, and Robert Harms, 1–89. London: Holt, Rinehart, and Winston.
Gil-Berrozpe, Juan C., Pilar León-Araúz, and Pamela Faber. 2019. “Ontological
Knowledge Enhancement in
EcoLexicon.” In Electronic
Lexicography in the 21st Century. Proceedings of the eLex 2019
Conference, 177–197. Sintra: Lexical Computing. [URL]
Girju, Roxana, Dan Moldovan, Marta Tatu, and Daniel Antohe. 2005. “On
the Semantics of Noun Compounds.” Computer
Speech and
Language 19 (4): 479–496.
Green, Nathan. 2011. “Effects
of Noun Phrase Bracketing in Dependency Parsing and Machine
Translation.” In 49th
Annual Meeting of the
ACL, 69–74. Portland, OR: ACL. [URL]
Kim, Su Nam, and Timothy Baldwin. 2013. “A
Lexical Semantic Approach to Interpreting and Bracketing English Noun
Compounds.” Natural Language
Engineering 19 (3): 385–407.
Klie, Jan-Christoph, Michael Bugert, Beto Boullosa, Richard Eckart de Castilho, and Iryna Gurevych. 2018. “The
INCEpTION Platform: Machine-assisted and Knowledge-oriented Interactive
Annotation” In Proceedings
of the 27th International Conference on Computational
Linguistics, 5–9. Santa Fe, NM: ACL. [URL]
Kroeger, Paul R. 2005. Analyzing
Grammar: An Introduction. New York, NY: Cambridge University Press.
Lauer, Mark. 1994. “Conceptual
Association for Compound Noun
Analysis.” In Proceedings
of the Student Session at the 32nd Annual Meeting of the
ACL, 337–339. Las Cruces, NM: ACL. [URL].
. 1995. “Corpus
Statistics Meet the Noun Compound: Some Empirical
Results.” In Proceedings
of the 3rd Annual Meeting of the
ACL, 47–54. Cambridge, MA: ACL. [URL].
Lazaridou, Angeliki, Eva M. Vecchi, and Marco Baroni. 2013. “Fish
Transporters and Miracle Homes: How Compositional Distributional Semantics
Can Help NP
Parsing.” In Proceedings
of the 2013 Conference on Empirical Methods in
NLP, 1908–1913. Seattle, WA: ACL. [URL]
León-Araúz, Pilar, Melania Cabezas-García, and Pamela Faber. 2021. “Multiword-term
Bracketing and Representation in Terminological Knowledge
Bases.” In Proceedings
of the eLex 2021
Conference, 139–163. Brno: Lexical Computing. [URL]
León-Araúz, Pilar, Antonio San Martín, and Arianne Reimerink. 2018. “The
EcoLexicon English Corpus as an Open Corpus in Sketch
Engine.” In Proceedings
of the 18th EURALEX International
Congress, 893–901. Ljubljana: Ljubljana University Press. [URL]
Marcus, Mitchell P. 1980. A
Theory of Syntactic Recognition for Natural
Language. Cambridge, MA: The MIT Press.
Ménard, Pierre A., and Caroline Barrière. 2014. “Linked
Open Data and Web Corpus Data for Noun Compound
Bracketing.” In Proceedings
of the 9th International Conference on Language Resources and
Evaluation, 702–709. Reykjavik: ELRA. [URL]
Nakov, Preslav, and Marti Hearst. 2005. “Search
Engine Statistics beyond the N-gram: Application to Noun Compound
Bracketing.” In Proceedings
of the 9th Conference on Computational Natural Language
Learning, 17–24. Ann Arbor, MI: ACL. [URL].
Pimentel, Janine. 2015. “Using
Frame Semantics to Build a Bilingual Lexical Resource on Legal
Terminology.” In Handbook
of Terminology, ed. by Hendrik J. Kockaert, and Frieda Steurs, 425–450. Amsterdam: John Benjamins. [URL].
Pitler, Emely, Shane Bergsma, Dekang Lin, and Kenneth Church. 2010. “Using
Web-scale N-grams to Improve Base NP Parsing
Performance.” In Proceedings
of the 23rd International Conference on Computational
Linguistics, 886–894. Beijing: ACL. [URL]
Resnik, Philip S. 1993. Selection
and Information: A Class-based Approach to Lexical
Relationships. PhD
diss. University of Pennsylvania. IRCS Technical
Reports Series 200. Philadelphia, PA: University of Pennsylvania IRCS. [URL]
Rojas-Garcia, Juan. 2022. “Semantic
Representation of Context for Description of Named Rivers in a
Terminological Knowledge Base.” Frontiers in
Psychology 131: 847024.
Thompson, Paul, Syed A. Iqbal, John McNaught, and Sophia Ananiadou. 2009. “Construction
of an Annotated Corpus to Support Biomedical Information
Extraction.” BMC
Bioinformatics 101: 349.
Vadas, David, and James Curran. 2007. “Large-scale
Supervised Models for Noun Phrase
Bracketing.” In Proceedings
of the 10th Conference of the Pacific
ACL, 104–112. Melbourne: ACL. [URL]
. 2008. “Parsing
Noun Phrase Structure with
CCG.” In Proceedings
of the 46th Annual Meeting of the
ACL, 335–343. Columbus, OH: ACL. [URL]
