In:Multiword Units in Machine Translation and Translation Technology
Edited by Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor and Violeta Seretan
[Current Issues in Linguistic Theory 341] 2018
► pp. 41–60
Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine translation system
Published online: 20 July 2018
https://doi.org/10.1075/cilt.341.02inu
https://doi.org/10.1075/cilt.341.02inu
Abstract
This paper describes an in-depth analysis of noun + verb combinations in Spanish-Basque translations. Firstly, we
examined noun + verb constructions in the dictionary, and confirmed that this kind of MWU varies considerably from
language to language, which justifies the need for their specific treatment in MT systems. Then, we searched for those
combinations in a parallel corpus, and we selected the most frequently-occurring ones to analyse them further and
classify them according to their level of syntactic fixedness and semantic compositionality. We tested whether adding
linguistic data relevant to MWUs improved the detection of Spanish combinations, and we found that, indeed, the number
of MWUs identified increased by 30.30% with a precision of 97.61%. Finally, we also evaluated how an RBMT system
translated the MWUs we analysed, and concluded that at least 44.44% needed to be corrected or improved.
Article outline
- 1.Introduction
- 2.Definitions, challenges and treatment of MWUs in MT
- 3.Linguistic analysis of Basque and Spanish noun + verb combinations
- 3.1Noun + verb combinations in bilingual dictionaries
- 3.1.1 Basque and Spanish noun + verb combinations in the dictionary
- 3.1.2Translations of noun + verb combinations in the dictionary
- 3.1.3Equivalences of noun + verb constructions in translations
- 3.2Contrasting information with parallel corpora
- 3.3Classification of the Spanish MWUs
- 3.3.1Syntactic flexibility
- 3.3.2Semantic compositionality
- 3.1Noun + verb combinations in bilingual dictionaries
- 4.Evaluation of MWU detection and translation adequacy
- 4.1Evaluation of MWU detection
- 4.2Evaluation of MWU translation quality in an RBMT system
- 5.Conclusions and future work
Acknowledgements Notes Bibliography
References (29)
Alegria, I., Ansa, O., Artola, X., Ezeiza, N., Gojenola, K., & Urizar, R. (2004, July). Representation and treatment of multiword expressions in Basque. In Proceedings of the Workshop on Multiword Expressions: Integrating Processing (pp.48–55). Association for Computational Linguistics.
Baldwin, T., Bender, E. M., Flickinger, D., Kim, A., & Oepen, S. (2004, May). Road-testing the English Resource Grammar Over the British National Corpus. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC 2004),
Lisbon, Portugal.
Baldwin, T., & Kim, S. N. (2010). Multiword expressions. Handbook of Natural Language Processing, second edition. Morgan and Claypool.
Blunsom, P. (2007). Structured classification for multilingual natural language processing (Doctoral dissertation, University of Melbourne, Melbourne, Australia).
Bouamor, D., Semmar, N., & Zweigenbaum, P. (2012, May). Identifying bilingual Multi-Word Expressions for Statistical Machine Translation. In LREC 2012, Eigth International Conference on Language Resources and Evaluation, (pp.674–679). Istanbul, Turkey
Butt, M. (2010). The light verb jungle: still hacking away. Complex predicates in cross-linguistic perspective (pp.48–78).
Copestake, A., Lambeau, F., Villavicencio, A., Bond, F., Baldwin, T., Sag, I., & Flickinger, D. (2002). Multiword expressions: linguistic precision and reusability. In Proceedings of the 3rd International Conference on Language Resources and Evaluation, LREC
2002, (pp.1941–1947). Las Palmas, Spain.
Dubremetz, M., & Nivre, J. (2014). Extraction of Nominal Multiword Expressions in French. In Proceedings of the 10th Workshop on Multiword Expressions (MWE), (pp. 72–76,). Gothenburg, Sweden,
Gurrutxaga, A., & Alegria, I. (2011, June). Automatic extraction of NV expressions in Basque: basic issues on cooccurrence
techniques. In Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real
World (pp.2–7). Association for Computational Linguistics.
Heylen, D., & Maxwell, K. (1994). Lexical Functions and the Translation of Collocations. In Proceedings of Euralex.
Inurrieta, U., Aduriz, I., Diaz de Ilarraza, A., Labaka, G., Sarasola, K. and Carroll, J. (2016). Using linguistic data for English and Spanish verb-noun combination identification. In Proceedings of the 26th International Conference on Computational Linguistics (COLING 2016): Technical
Papers (pp.857–867).
Inurrieta, U., Aduriz, I., Diaz de Ilarraza, A., Labaka, G., and Sarasola, K. (2017). Rule-based translation of Spanish verb-noun combinations into Basque. In Proceedings of the 13th Workshop on Multiword Expressions, in EACL 2017 (pp.149–154).
Mayor, A., Alegria, I., De Ilarraza, A. D., Labaka, G., Lersundi, M., & Sarasola, K. (2011). Matxin, an open-source rule-based machine translation system for Basque. Machine translation, 25(1), 53–82.
Padró, L., & Stanilovsky, E. (2012). Freeling 3.0: Towards wider multilinguality. In Proceedings of the Language Resources and Evaluation Conference (LREC 2012) ELRA. Istanbul, Turkey.
Pecina, P. (2008, June). A machine learning approach to multiword expression extraction. In Proceedings of the LREC Workshop Towards a Shared Task for Multiword Expressions (MWE 2008) (pp.54–61).
Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Computational Linguistics and Intelligent Text Processing (pp.1–15). Springer Berlin Heidelberg.
Seretan, V. (2013, October). On collocations and their interaction with parsing and translation. In Informatics (Vol. 1, No. 1, pp.11–31). Multidisciplinary Digital Publishing Institute.
Simova, I., & Kordoni, V. (2013, September). Improving English-Bulgarian statistical machine translation by phrasal verb
treatment. In Proceedings of MT Summit XIV Workshop on Multi-word Units in Machine Translation and Translation
Technology, Nice, France.
Torner, S. & Bernal, E. (eds.) 2017. Collocations and Other Lexical Combinations in Spanish. Theoretical and Applied Approaches London: Routledge.
Tsvetkov, Y., & Wintner, S. (2012). Extraction of multi-word expressions from small parallel corpora. Natural Language Engineering, 18(04), 549–573.
Urizar, R. (2012). Euskal lokuzioen tratamendu konputazionala (Doctoral dissertation, Faculty of Computer Science, University of the Basque Country).
Villavicencio, A., Bond, F., Korhonen, A., & McCarthy, D., (Eds.) (2005). Computer Speech & Language (Special issue on Multiword Expressions), volume 19. Elsevier.
