In:Multiword Units in Machine Translation and Translation Technology
Edited by Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor and Violeta Seretan
[Current Issues in Linguistic Theory 341] 2018
► pp. 103–124
Multiword expressions in multilingual information extraction
Published online: 20 July 2018
https://doi.org/10.1075/cilt.341.05thu
https://doi.org/10.1075/cilt.341.05thu
Abstract
Multilingual Information Extraction requires significant Multiword Expressions (MWE) processing as many such items are
multiwords. The lexical representation of MWEs supports large bilingual lexicons (for Persian, Pashto, Turkish,
Arabic); multiwords are represented like single words, extended by two annotations: MWE head, and lemma plus part of
speech for the MWE parts. In text analysis, MWEs are recognised as part of the parsing process, mot as pre- or
post-processing components. The analysis design extends the X-bar scheme by a level for multiword rules. In transfer,
MWEs are translated as elementary nodes like single word lemmata, to present key concepts for relevance judgement in
Information Extraction. Evaluation shows that 90% of the MWE patterns in the lexicon can be analysed with about 150
MWE-specific rules, and that more than 90% of text document tokens are covered by the proposed integrated single and
multiword processing.
Article outline
- 1.Introduction
- 2.Application context
- 3.MWEs in multilingual information processing
- 3.1MWE extraction
- 3.2MWE lexical representation
- 3.2.1Design
- 3.2.2The lexicon
- 3.2.2.1General annotations
- 3.2.2.2MWE extensions
- 3.3MWE analysis and identification
- 3.3.1Design
- 3.3.1.1MWE processing after analysis
- 3.3.1.2 MWE processing before analysis
- 3.3.2MWE treatment in analysis
- 3.3.2.1Preprocessing
- 3.3.2.2Chart initialisation
- 3.3.2.3Analysis design
- 3.3.2.4Multiword analysis rules
- 3.3.2.5Analysis output
- 3.3.1Design
- 3.4MWE translation and generation
- 3.4.1Transfer
- 3.4.2Generation
- 4.Evaluation
- 4.1MWE coverage: rule vs. lexicon compatibility
- 4.2Lexicon coverage
- 5.Conclusion
Acknowledgements Notes References
References (48)
Acosta, O., Villavicencio, & Moreira, V. (2011). Identification and Treatment of Multiword Expressions applied to Information
Retrieval. In Proceedings Workshop on Multiword Expressions: From Parsing and Generation to the
Real World (MWE 2011): Portland, Oregon, USA
Amtrup, J., Rad M., Megerdoomian, K. & Zajac, R. (2000). Persian-English Machine Translation: An Overview of the Shiraz Project. Report from the University of New Mexico
Anastasiou, D. (2010). Idiom Treatment Experiments in Machine Translation. Diss. Saarbrücken, Germany
Arun, A., & Keller, F. (2005). Lexicalization in Crosslinguistic Probabilistic Parsing: The Case of French. In In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics
(ACL’05) (pp. 306–313).
Attia, M., Toral, A., Tounsi, L., Pecina, P., & van Genabith, J. (2010). Automatic Extraction of Arabic Multiword Expressions. In Proceedings of the Workshop on Multiword Expressions: From Theory to Applications (MWE 2010): Beijing,
China
Bejček, E., Straňák, P., & Pecina, P. (2013). Syntactic Identification of Occurrences of Multiword Expressions in Text using a Lexicon with
Dependency Structures. In Proceedings of the 9th Workshop on Multiword Expressions (MWE 2013): Atlanta, Georgia, USA
Bonin, Fr., Dell‘Orletta, F., Venturi, G., & Montemagni, S.. (2010).Contrastive Filtering of Domain-Specific Multi-Word Terms from Different Types of
Corpora. In Proceedings of the Workshop on Multi word Expressions: From Theory to Applications (MWE 2010): Beijing,
China
Candito, M., & Constant, M. (2014). Strategies for Contiguous Multiword Expression Analysis and Dependency Parsing. In Proceedings of the 52 ACL: Baltimore, Maryland, USA.
Çetinoğlu, Ö., & Oflazer, K. (2006). Morphology-Syntax Interface for Turkish LFG. In Proceedings of ACL.
Charniak, E. (1997). Statistical Parsing with a Context-free Grammar and Word Statistics. Proc. AAAI‘97/IAAI’97, p.598–603, AAAI Press ©1997
Constant, M. & Tellier, I. (2012). Evaluating the Impact of External Lexical Resources into a CRF-based Multiword Segmenter and
Part-of-Speech Tagger. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC'12) LREC:
Istanbul (pp.646–650).
De Araujo, V., Ramisch, C., & Villavicencio, A. (2011). Fast and Flexible MWE Candidate Generation with the mwetoolkit. In Proceedings of the Workshop on Multiword Expressions (MWE 2011): Portland, Oregon, USA
Deksne, D., Skadiņš, R., & Skadiņa, I. (2008). Dictionary of Multiword Expressions for Translation into Highly Inflected Languages. In Proceedings of LREC Marrakech
Dubremetz, M., & Nivre, J. (2014). Extractionof Nominal Multiword Expressions in French. In Proceedings of the 10th Workshop on MultiwordExpressions (MWE 2014): Gothenburg, Sweden
Escartín, C. P., Losnegaard, G. S., Samdahl, G. I. L., García, P. P. (2013). Representing Multiword Expressions in Lexical and Terminological Resources: An Analysis for Natural
Language Processing Purposes. In Proceedings of eLex 2013: Tallinn, Estonia
Family, N. (2006). Explorations of Semantic Space: The Case of Light Verb Constructions in Persian. Diss. Paris
Farahmand, M., & Martins, R. (2014). A Supervised Model for Extraction of Multiword Expressions Based on Statistical Context
Features. In Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014): Gothenburg, Sweden.
Fotopoulou, A., & Markantonatou, St., Giouli, V. (2014). Encoding MWEs in a conceptual lexicon. In Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014): Gothenburg, Sweden.
Francopoulo, G., Bel, N., George, M., Calzolari, N., Monachini, M., Pet, M., & Soria, Cl. (2006). Lexical markup Framework (LMF) for NLP Multilingual Resources. In Proceedings of the Workshop on Multilingual Language Resources and Interoperability: Sydney, Australia.
Graliński, F., Savary, A., Czerepowicka, M., & Makowiecki, F. (2010). Computational Lexicography of Multi-Word Units: How Efficient Can It Be? In Proceedings of the Workshop on Multiword Expressions: From Theory to Applications (MWE 2010): Beijing,
China
Green, S., de Marneffe, M. C. Bauer, J., & Manning, Ch. (2011). Multiword Expression Identification with Tree Substitution Grammars: A Parsing tour de force with
French. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing: Edinburgh,
Scottland
Grégoire, N. (2007). Design and Implementation of a Lexicon of Dutch Multiword Expressions. In Proceedings of the Workshop on A Broader Perspective on Multiword Expressions: Prague, the Czech
Republic.
(2009). Untangling Multiword Expressions. A study on the representation and variation of Dutch multiword expressions. Diss. Utrecht. Utrecht (LOT)
Hakkani-Tür, D., & Oflazer, K. (2002). Statistical Morphological Disambiguation for Agglutinative Languages. Computers and the Humanities, In: Computers and the 36 (4), 381–410.
Hurskainen, A. (2008). Multiword Expressions and Machine Translation. University of Helsinki. Technical Reports in Language Technology, Report No 1, 2008
Khozani, S. M. H., & Bayat, H. (2011). Specialization of Keyword Extraction Approach to Persian Texts. In Proceedings of the International Conference of Soft Computing and Pattern Recognition (SoCPaR
2011):Dalian, China.
Kulkarni, N., & Finlayson, M. A. (2011). MWE: A Java Toolkit for Detecting Multi-Word Expressions. In Proceedings of the Workshop on Multiword Expressions: From Parsing and Generation to the Real World (MWE
2011): Portland, Oregon, USA
Lee, J. (2011). Two Types of Korean Light Verb Constructions in a Typed Feature Structure Grammar. In Proceedings of the Workshop on Multiword Expressions: From Parsing and Generation to the Real World (MWE
2011): Portland, Oregon, USA
Martens, Sc., & Vandeghinste, V. (2010). An Efficient, Generic Approach to Extracting Multi-Word Expressions from
DependencyTrees. In Proceedings of the Workshop on Multiword Expressions: From Theory to Applications (MWE 2010): Beijing,
China
McCord, M. (1989). Design of LMT: a prolog-based machine translation system. J. ComputationalLinguistics 15, 1, (1989), 33–53
Moreno-Ortiz, A., Pérez-Hernández, Ch., & Del-Olmo, M. A. (2013). Managing Multiword Expressions in a Lexicon-Based Sentiment Analysis System for
Spanish. In Proceedings of the 9th Workshop on Multiword Expressions (MWE 2013): Atlanta, Georgia, USA
Nidhi Kulkarni, N., & Finlayson, M. A. (2011). MWE: A Java Toolkit for Detecting Multi-Word Expressions. In Proceedings of the Workshop on Multiword Expressions: From Parsing and Generation to the Real World (MWE
2011): Portland, Oregon, USA
Nissim, M., Castagnoli, S., & Masini, Fr. (2014). Extracting MWEs from Italian corpora: A case study for refining the POS-pattern
methodology. In Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014): Gothenburg, Sweden
Oflazer, K., Çetinoğlu, Ö., & Say, B. (2004). Integrating Morphology with Multi-word Expression Processing in Turkish. In Proceedings of the Second ACL Workshop on Multiword Expressions. Integrating Processing: Barcelona, Spain
Quocchi, V., Frontini, F., & Rubino, F. (2012). A MWE Acquisition and Lexicon Builder Web Service. In Proceedings of COLING: Mumbai, India
Sag, I., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword Expressions: a Pain in the Neck for NLP. In: Proceedings of the Third International Conference on Intelligent Text Processing and Computational
Linguistics. (CICLing): Mexico City, Mexico.
Sagot, B., & Walther, G. (2010). A Morphological Lexicon for the Persian Language. In Proceedings of LREC: Malta
Samaridi, N., & Markantonatou, St. (2014). Parsing Modern Greekverb MWEs with LFG/XLE grammars. In Proceedings of the 10th Workshop on Multiword Expressions (MWE 2014): Gothenburg, Sweden
Shamsfard, M., Jafari, H. S., & Ilbeygi, M. (2010). STeP-1: A Set of Fundamental Tools for Persian Text Processing. In Proceedings of the LREC: Malta
Thurmair, Gr. (1990). Complex Lexical Transfer in METAL. In Proceedings of the Third International Conference on Theoretical and Methodological Issues in Machine
Translation of Natural Languages Austin, Texas, USA (pp. 91–107).
Thurmair, Gr., & Aleksić, V. (2012). Creating Term and Lexicon Entries from Phrase Tables. In Proceedings of the 16th EAMT: Trento, Italy
Tu, Y., & Roth, D. (2011). Learning English Light Verb Constructions: Contextualor Statistical. In Proceedings of the Workshop on MultiwordExpressions: From Parsing and Generation to the Real World (MWE
2011): Portland, Oregon, USA
Villavicencio, A., Copestake, A., Waldron, B., & Lambeau, F. (2004). Lexical Encoding of MWEs. In Proceedings of the Second ACL Workshop on Multiword Expressions: Integrating Processing: Barcelona, Spain
Vincze, V., Nagy, I., & Berend, G. (2011). Detecting noun compounds and light verb constructions: a contrastive study. In Proceedings of the Workshop on Multiword Expressions: From Parsing and Generation to the Real World (MWE
2011): Portland, Oregon, USA
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
