In:Recent Advances in Multiword Units in Machine Translation and Translation Technology
Edited by Johanna Monti, Gloria Corpas Pastor, Ruslan Mitkov and Carlos Manuel Hidalgo-Ternero
[Current Issues in Linguistic Theory 366] 2024
► pp. 173–196
Chapter 10Semantic annotation of named rivers and its application for the prediction of multiword-term
bracketing
Published online: 7 November 2024
https://doi.org/10.1075/cilt.366.10roj
https://doi.org/10.1075/cilt.366.10roj
Abstract
The acquisition of knowledge is essential for specialized translation, hence the representation of
specialized phraseology in terminological knowledge bases is part of this process. The aim of this study was thus
two-fold. Firstly, it describes how the semantic annotation of predicate-argument structure of sentences mentioning
named rivers can be addressed from the perspective of Frame-based Terminology. The results showed that this approach
provides valuable insights into the knowledge structures underlying the usage of named rivers in specialized texts.
Secondly, this study explores whether the bracketing of a three-component multi-word term can be predicted from the
semantic information encoded in the sentence where the ternary compound and a named river are used as arguments. The
semantic annotations permitted construction of two machine-learning models capable of accurately predicting
ternary-compound bracketing.
Article outline
- 1.Introduction
- 2.Frame-based Terminology
- 3.Materials and methods
- 3.1Corpus data
- 3.2GeoNames geographic database
- 3.3Recognition of named rivers
- 3.4From multi-word term level to phrase level: Semantic annotation of predicate-argument structures for named rivers
- 3.4.1Predicate classification in lexical domains
- 3.4.2Semantic roles
- 3.4.3Semantic categories
- 3.4.4Semantic relations
- 3.4.5Inter-annotator agreement
- 4.Results of the semantic annotations
- 4.1Lexical domain of movement
- 4.2Construction of frames evoked by named rivers
- 5.Prediction of the bracketing of three-component multi-word
terms
- 5.1Bracketing of multi-word terms
- 5.2Methods for bracketing prediction in the literature
- 5.3Semantic approach for the prediction of ternary compound bracketing
- 5.3.1Description of the sample of ternary compounds
- 5.3.2Supervised models
- 5.3.3Data splitting
- 5.3.4Model performance measures
- 5.3.5Construction of the supervised models
- 5.4Comparison of the results with previous research
- 6.Conclusions
Notes References
References (30)
Barrière, C., & Ménard, P. A. (2014). Multiword
noun compound bracketing using
Wikipedia. In Proceedings of the First Workshop on
Computational Approaches to Compound
Analysis (pp. 72–80). ACL.
Bergsma, S., Pitler, E., & Lin, D. (2010). Creating
robust supervised classifiers via Web-scale n-gram
data. In Proceedings of the 48th Annual Meeting of
the
ACL (pp. 865–874). ACL.
Boas, H. C. (2005). Semantic
frames as interlingual representations for multilingual lexical
databases. International Journal of
Lexicography, 18(4), 445–478.
Buendía-Castro, M., & Faber, P. (2016). Phraseological
correspondence in English and Spanish specialized
texts. In G. Corpas (Ed.), Computerised
and corpus – based approaches to phraseology: Monolingual and multilingual
perspectives (pp. 391–398). Tradulex.
Faber, P. (2009). The
cognitive shift in terminology and specialized translation. MonTI. Monografías de Traducción e Interpretación [Monographs on Translation and
Interpreting], 1, 107–134.
(Ed.). (2012). A
cognitive linguistics view of terminology and specialized language. De Gruyter Mouton.
Faber, P., & Cabezas – García, M. (2019). Specialized
knowledge representation: From terms to frames. Research in
Language, 17(2), 197–211.
Faber, P., León-Araúz, P., & Prieto, J. A. (2009). Semantic
relations, dynamicity, and terminological knowledge bases. Current Issues in
Language
Studies, 1, 1–23.
Faruqui, M., & Dyer, C. (2015). Non-distributional
word vector representations. In Proceedings of the
53rd Annual Meeting of the
ACL (pp. 464–469). ACL.
Fillmore, C. (1968). The
case for case. In E. Bach & R. Harms (Eds.), Universals
in Linguistic
Theory (pp. 1–89). Holt, Rinehart, and Winston.
Gil-Berrozpe J. C, León-Araúz, P., & Faber, P. (2019). Ontological
knowledge enhancement in
EcoLexicon. In Electronic Lexicography
in the 21st Century. Proceedings of the eLex 2019
Conference (pp. 177–197). Lexical Computing.
Girju, R., Moldovan, D. I., Tatu, M., & Antohe, D. (2005). On
the semantics of noun compounds. Computer Speech and
Language, 19(4), 479–496.
Green, N. (2011). Effects
of noun phrase bracketing in dependency parsing and machine
translation. In 49th Annual Meeting of the
ACL (pp. 69–74). ACL.
Kim, S. N., & Baldwin, T. (2013). A
lexical semantic approach to interpreting and bracketing English noun
compounds. Natural Language
Engineering, 19(3), 385–407.
Lauer, M. (1994). Conceptual
association for compound noun
analysis. In Proceedings of the Student Session at
the 32nd Annual Meeting of the Association for Computational
Linguistics (pp. 337–339). CoRR.
(1995). Corpus
statistics meet the noun compound: Some empirical
results. In Proceedings of the 3rd Annual Meeting of
the
ACL (pp. 47–54). ACL.
Lazaridou, A., Vecchi, E. M., & Baroni, M. (2013). Fish
transporters and miracle homes: How compositional distributional semantics can help NP
parsing. In Proceedings of the 2013
Conference on Empirical Methods in
NLP (pp. 1908–1913). ACL.
León-Araúz, P., Cabezas-García, M., & Faber, P. (2021). Multiword-term
bracketing and representation in terminological knowledge
bases. In Proceedings of the eLex 2021
Conference (pp. 139–163). Lexical Computing CZ.
León-Araúz, P., San Martín, A., & Reimerink, A. (2018). The
EcoLexicon English corpus as an open corpus in Sketch
Engine. In Proceedings of the 18th EURALEX
International
Congress (pp. 893–901). Euralex.
Ménard, P. A., & Barrière, C. (2014). Linked
open data and web corpus data for noun compound
bracketing. In Proceedings of the 9th
International Conference on Language Resources and
Evaluation (pp. 702–709). ELRA.
Nakov, P., & Hearst, M. (2005). Search
engine statistics beyond the n-gram: Application to noun compound
bracketing. In Proceedings of the 9th
Conference on Computational Natural Language
Learning (pp. 17–24). ACL.
Pimentel, J. (2015). Using
frame semantics to build a bilingual lexical resource on legal
terminology. In H. J. Kockaert & F. Steurs (Eds.), Handbook
of Terminology (Vol.
1, pp. 425–450). John Benjamins.
Pitler, E., Bergsma, S., Lin, D., & Church, K. W. (2010). Using
web-scale n-grams to improve base NP parsing
performance. In Proceedings of the
23rd International Conference on Computational
Linguistics (pp. 886–894). ACL.
Resnik, P. S. (1993). Selection
and information: A class-based approach to lexical relationships [Doctoral
dissertation, University of Pennsylvania].
Thompson, P., Iqbal, S. A., McNaught, J., & Ananiadou, S. (2009). Construction
of an annotated corpus to support biomedical information extraction. BMC
Bioinformatics, 10, 349.
Vadas, D., & Curran, J. R. (2007). Large-scale
supervised models for noun phrase
bracketing. In Proceedings of the 10th
Conference of the Pacific
ACL (pp. 104–112). ACL.
