In:Recent Advances in Multiword Units in Machine Translation and Translation Technology
Edited by Johanna Monti, Gloria Corpas Pastor, Ruslan Mitkov and Carlos Manuel Hidalgo-Ternero
[Current Issues in Linguistic Theory 366] 2024
► pp. 79–102
Chapter 5Evaluating a bracketing protocol for multiword terms
Published online: 7 November 2024
https://doi.org/10.1075/cilt.366.05leo
https://doi.org/10.1075/cilt.366.05leo
Abstract
Multiword terms (MWTs) are frequently used to encapsulate and convey meaning in scientific and
technical texts. However, they can also make these texts difficult to understand because the relations between
constituents are not transparent. When MWTs have more than two constituents, a dependency analysis (bracketing) is
often necessary to facilitate their interpretation. NLP has proposed various models to automatize bracketing
operations, but none has been entirely satisfactory. This paper presents a protocol that combines various models and
applies it to a set of three-constituent MWTs in order to: (i) sort rules by their disambiguation potential, based on
their likelihood of retrieving results from any corpus and their ability to solve bracketing; and (ii) ascertain the
influence of corpus size and type in the results obtained.
Keywords: multiword term, bracketing, structural disambiguation, corpus, terminology
Article outline
- 1.Introduction
- 2.Bracketing models
- 3.Materials and methods
- 3.1MWT extraction and manual bracketing
- 3.2Queries
- 3.3Bracketing rules
- 4.Results
- 4.1Rule comparison
- 4.1.1Quantitative performance of the rules
- 4.1.2Qualitative performance of the rules
- 4.1.3Quantitative and qualitative performance of the rules
- 4.2Comparison of corpora
- 4.3Comparison of MWT bracketing
- 4.1Rule comparison
- 5.Conclusions
References Appendix
References (12)
Balyan, R. & Chatterjee, N. (2015). Translating
noun compounds using semantic relations. Computer Speech and
Language, 32, 91–108.
Barrière, C., & Ménard, P. A. (2014). Multiword
noun compound bracketing using
Wikipedia. In Proceedings of the First Workshop on
Computational Approaches to Compound
Analysis (pp. 72–80). ACL and Dublin City University.
Cabezas-García, M., & León-Araúz, P. (2019). On
the structural disambiguation of multi-word
terms. In G. Corpas Pastor & R. Mitkov (Eds.), Computational
and corpus-based phraseology, Lecture Notes in Computer Science,
11755 (pp. 46–60). Springer.
Girju, R., Moldovan, D., Tatu, M., & Antohe, D. (2005). On
the semantics of noun compounds. Computer Speech &
Language, 19(4), 479–496.
Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. (2004). The
sketch engine. In G. Williams & S. Vessier (Eds.), Proceedings
of the Eleventh EURALEX International
Congress (pp. 105–116). EURALEX.
Lauer, M. (1995). Designing
statistical language learners: Experiments on noun compounds. PhD
dissertation. Macquarie University, Australia.
León-Araúz, P., Cabezas-García, M., & Faber, P. (2021). Multiword-term
bracketing and representation in terminological knowledge
bases. In Seventh Biennial Conference
on Electronic Lexicography, eLex
2021 (pp. 139–163). Lexical Computing.
Nakov, P. (2007). Using
the web as an implicit training set: Application to noun compound syntax and
semantics. PhD dissertation. University of
California at Berkeley.
