In:Recent Advances in Multiword Units in Machine Translation and Translation Technology
Edited by Johanna Monti, Gloria Corpas Pastor, Ruslan Mitkov and Carlos Manuel Hidalgo-Ternero
[Current Issues in Linguistic Theory 366] 2024
► pp. 218–242
Chapter 12A comprehensive Japanese MWE lexicon
JMWEL
Published online: 7 November 2024
https://doi.org/10.1075/cilt.366.12tak
https://doi.org/10.1075/cilt.366.12tak
Abstract
JMWEL (Japanese MWE Lexicon) is a comprehensive lexicon of Japanese Multiword Expressions (MWEs)
with a rich set of grammatical attributes fine-tuned for phrase-based processing of a wide range of Japanese
documents. It has about 160,000 MWE lemmas covering almost every kind of linguistically idiosyncratic but commonly
used Japanese phrases, e.g., idioms, quasi-idioms, collocations, quasi-collocations, clichés, quasi-clichés,
institutionalized phrases, proverbs, and old sayings, excepting technical terms in specialized fields or named
entities. JMWEL consists of sixteen sub-lexicons reflecting their distinctive features. The comprehensiveness of the
collected MWEs and the detailed morpho-syntactic information given to each MWE, which may include internal modifiers,
are notable features of JMWEL. In this paper, we introduce the newest version of JMWEL.
Article outline
- 1.Introduction
- 2.MWE lemma
- 2.1Non-compositionality
- 2.2Probabilistically idiosyncratic MWEs
- 2.3Distribution of MWEs in JMWEL
- 3.Organization of JMWEL
- 3.1Sub-lexicons compiled based on grammatical functions
- 3.2Sub-lexicons organized by topic
- 4.Data entries in JMWEL
- 4.1Type label
- 4.2Lemma
- 4.3Constituent morphemes
- 4.4Orthographic variants
- 4.5Syntactic function
- 4.6Morpho-syntactic structure
- 4.7Internal modification
- 4.8Forward context condition
- 4.9Backward context condition
- 4.10Inflection
- 4.11Interpretation
- 5.Applications
- 6.Related work
- 7.Concluding remarks
Acknowledgements Notes References
References (16)
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (Eds.) (1999). Longman
grammar of spoken and written English. Pearson
Education.
Church, K. (2011). How
many multiword expressions do people know? Proceedings of the Workshop on
Multiword Expressions: From Parsing and Generation to the Real World (MWE
2011), (pp. 137–144). Invited Talk.
Constant, M., Eryigit, G., Monti, J., van der Plas, L., Ramisch, C., Rosner, M., & Todirascu, A. (2017). Multiword
expression processing: A survey. Computational
Linguistics, 43(4), 837–892.
Corrigan, R., Moravcsik, E. A., Ouali, H., & Wheatley, K. (Eds.) (2009). Formulaic
language, vol.1 distribution and historical change. John Benjamins.
Cowie, A. P. (Ed.) (1998). Phraseology:
Theory, analysis, and applications. (Oxford Studies in Lexicography and
Lexicology). Clarendon Press.
Fillmore, C., Kay, P., & O’Connor, M. C. (1988). Regularity
and idiomatical grammatical construction: The case of let
alone. Language, 64, 501–538.
Hashimoto, C., & Kawahara, D. (2008). Construction
of an idiom corpus and its application to idiom identification based on WSD incorporating idiom-specific
features. Proceedings of the 2008 Conference on Empirical Methods in
Natural Language
Processing (pp. 992–1001).
Kudo, T., & Kazawa, H. (2009). Japanese
web n-gram Version 1. Linguistic Data Consortium 2009.
T08.
Ramisch, C. (2017). Putting
the Horses Before the Cart: Identifying Multiword Expressions Before
Translation. The EUROPHRAS 2017 (MUMTTT 2017).
Proceedings (pp. 69–84). Invited Talk.
Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword
expressions: A pain in the neck for NLP. Proceedings of the 3rd
International Conference on Intelligent Text Processing and Computational Linguistics (CICLing
2002) (pp. 1–15).
Savary, A., Cordeiro, S. R., & Ramisch, C. (2019). Without
lexicons, multiword expression identification will never fly: A position
statement. Proceedings of EUROPHRAS
2019 (pp. 79–91).
Shudo, K. (2011). Research
studio for Japanese language processing, [URL]
(1973). On
machine translation from Japanese into English for a technical field. Journal
of the Information Processing Society of
Japan, 14(9), 661–668. (Japanese
title: 専門分野を対象とした日英機械翻訳について).
Shudo, K., Fujita, T., & Yoshida, S. (1978). On
the processing of annexational expressions in Japanese. Proceedings
of the international conference on the Computational Linguistics,
COLING78 (pp. 53.1–53.11).
Tanabe, T, Takahashi, M., & Shudo, K. (2014). A
lexicon of multiword expressions for linguistically precise, wide-coverage natural language
processing. Computer Speech and
Language, 28(6), 1317–1339. Elsevier.
Zaninello, A., & Birch, A. (2020). Multiword
expression aware neural machine translation. Proceedings of the 12th
Conference on Language Resources and Evaluation (LREC
2020) (pp. 3816–3825).
