In:The Swedish FrameNet++: Harmonization, integration, method development and practical language technology applications
Edited by Dana Dannélls, Lars Borin and Karin Friberg Heppin
[Natural Language Processing 14] 2021
► pp. 221–260
Get fulltext
Chapter 9Multiword expressions – a tough typological nut for Swedish
FrameNet++
Available under the Creative Commons Attribution-NonCommercial-NoDerivatives (CC BY-NC-ND) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 26 November 2021
https://doi.org/10.1075/nlp.14.09bor
https://doi.org/10.1075/nlp.14.09bor
Abstract
Multiword expressions have attracted much
attention in language technology over the last two decades or so,
and in general linguistics, the interest in phraseology – which
includes the linguistic study of multiword expressions – goes back
much further. In our work on the multilingual components of Swedish
FrameNet++, we have strived to adopt a typologically informed view
on multiword expressions. This raises a number of theoretical and
methodological questions, some of which are discussed in this
chapter.
Article outline
- 1.Background
- 2.Multiword expressions in Swedish FrameNet++
- 3.MWEs from a typological perspective: A first cut
- 3.1The “words” of MWEs
- 3.2The “lexemes” of MWEs
- 3.3How frequent are multiword expressions in language?
- 3.3.1MWEs in the lexicon
- 3.3.2Which kinds of lexical units should we count?
- 3.3.3MWEs in texts
- 3.3.4MWEs and parts of speech
- 3.3.5Towards a typological generalization?
- 3.3.5.1Data sources and data commensurability
- 3.3.5.2Verbs, nouns and other parts of speech in corpora and dictionaries
- 3.3.5.3Headedness, verb-to-noun ratio, and support verb constructions
- 3.3.5.4The diachrony of support verb constructions in verb-final languages
- 3.4What kinds of MWEs are there?
- 3.5Where do we find cross-linguistic MWE data?
- 4.Taking stock: Towards a typology of MWEs?
Notes References
References (87)
Aikhenvald, Alexandra. 2006. Serial verb constructions in typological
perspective. In Alexandra Aikhenvald & R. M. W. Dixon (eds.), Serial verb constructions: A cross-linguistic
typology, 1–68. Oxford: Oxford University Press.
. 2007. Typological distinctions in
word-formation. In Timothy Shopen (ed.), Language typology and syntactic description. Volume III:
Grammatical categories and the lexicon, 2nd edn., 1–65. Cambridge: Cambridge University Press.
Aikhenvald, Alexandra & R. M. W. Dixon. 2002. Word: A typological framework. In Alexandra Aikhenvald & R. M. W. Dixon (eds.), Word: A cross-linguistic typology, 1–41. Cambridge: Cambridge University Press.
(eds.). 2006. Serial verb constructions: A cross-linguistic
typology. Oxford: Oxford University Press.
Amith, Jonathan D. 2002. What’s in a word? The whys and
what fors of a Nahuatl
dictionary. In William Frawley, Kenneth C. Hill & Pamela Munro (eds.), Making dictionaries: Preserving indigenous languages of
the Americas, 219–258. Berkeley: University of California Press.
Bakker, Dik. 2011. Language sampling. In Jae Jung Song (ed.), The Oxford handbook of linguistic typology, 100–127. Oxford: Oxford University Press.
Baldwin, Timothy, Colin Bannard, Takaaki Tanaka & Dominic Widdows. 2003. An empirical model of multiword expression
decomposability. In Proceedings of MWE 2003, 89–96. Sapporo: ACL.
Baldwin, Timothy & Su Nam Kim. 2010. Multiword expressions. In Nitin Indurkhya & Fred J. Damerau (eds.), Handbook of natural language processing, 2nd edn., 267–292. Boca Raton: Chapman & Hall/CRC.
Bauer, Laurie. 2009. Typology of compounds. In Rochelle Lieber & Pavol Štekauer (eds.), The Oxford handbook of compounding, 343–356. Oxford: Oxford University Press.
Bickel, Balthasar & Fernando Zúñiga. 2017. The ‘word’ in polysynthetic languages:
Phonological and syntactic challenges. In Michael Fortescue, Marianne Mithun & Nicholas Evans (eds.), The Oxford handbook of polysynthesis, 158–185. Oxford: Oxford University Press.
Borin, Lars, Bernard Comrie & Anju Saxena. 2013. The Intercontinental Dictionary Series – a rich
and principled database for language
comparison. In Lars Borin & Anju Saxena (eds.), Approaches to measuring linguistic differences, 285–302. Berlin: De Gruyter Mouton.
Borin, Lars, Markus Forsberg & Lennart Lönngren. 2013. SALDO: A touch of yin to WordNet’s
yang. Language Resources and Evaluation 47(4): 1191–1211.
Bowern, Claire. 2008. The diachrony of complex
predicates. Diachronica 25(2): 161–185.
Burger, Harald, Dmitrij Dobrovol’skij, Peter Kühn & Neal R. Norrick (eds.). 2007. Phraseology: An international handbook of contemporary
research (2 volumes). Berlin: Walter de Gruyter.
Butt, Miriam. 2010. The light verb jungle: Still hacking
away. In Mengistu Amberber, Brett Baker & Mark Harvey (eds.), Complex predicates: Cross-linguistic perspectives on
event structure, 48–78. Cambridge: Cambridge University Press.
Chiarcos, Christian, Julia Ritz & Manfred Stede. 2012. By all these lovely tokens…
Merging conflicting tokenizations. Language Resources and Evaluation 46(1): 53–74.
Ciancaglini, Claudia A. 2011. The formation of the periphrastic verbs in
Persian and neighbouring languages. In Mauro Maggi & Paola Orsatti (eds.), The Persian language in history, 3–31. Wiesbaden: Reichert.
Constant, Mathieu, Gülşen Eryiğit, Johanna Monti, Lonneke van der Plas, Carlos Ramisch, Michael Rosner & Amalia Todirascu. 2017. Multiword expression processing: A
survey. Computational Linguistics 43(4): 837–892.
Czuczor, Gergely & János Fogarasi. 1862. A magyar nyelv szótára [Hungarian dictionary]. Pest: Emich Gusztáv Magyar akadémiai nyomdász.
Dixon, R. M. W. 2006. Serial verb constructions: Conspectus and
coda. In Alexandra Aikhenvald & R. M. W. Dixon (eds.), Serial verb constructions: A cross-linguistic
typology, 338–350. Oxford: Oxford University Press.
Dorais, Louis-Jacques. 2017. The lexicon in polysynthetic
languages. In Michael Fortescue, Marianne Mithun & Nicholas Evans (eds.), The Oxford handbook of polysynthesis, 135–157. Oxford: Oxford University Press.
Dridan, Rebecca & Stephan Oepen. 2012. Tokenization: Returning to a long solved
problem – a survey, contrastive experiment, recommendations,
and toolkit. In Proceedings of ACL 2012, 378–382. Jeju: ACL.
Dryer, Matthew S. & Martin Haspelmath (eds.). 2013. The world atlas of language structures online. Jena: Max Planck Institute for the Science of Human History.
Eberhard, David M., Gary F. Simons & Charles D. Fennig (eds.). 2021. Ethnologue: Languages of the world. 24th edn. Dallas: SIL International.
Erk, Katrin. 2010. What is word meaning, really? (And how can
distributional models help us describe it?) In Proceedings of the 2010 Workshop on GEometrical Models
of Natural Language Semantics, 17–26. Uppsala: ACL.
Evans, Nicholas. 2011. Semantic typology. In Jae Jung Song (ed.), The Oxford handbook of linguistic typology, 504–533. Oxford: Oxford University Press.
Fortescue, Michael, Marianne Mithun & Nicholas Evans (eds.). 2017. The Oxford handbook of polysynthesis. Oxford: Oxford University Press.
François, Alexandre. 2008. Semantic maps and the typology of colexification:
Intertwining polysemous networks across
languages. In Martine Vanhove (ed.), From polysemy to semantic change: Towards a typology of
lexical semantic associations, 163–215. Amsterdam: John Benjamins.
Gantar, Polina, Carla Parra Escartín & Héctor Martinez Alonso. 2019. Multiword expressions: Between lexicography and
NLP. International Journal of Lexicography 32(2): 138–162.
Gibbs, Raymond W., Jr, Nandini P. Nayak & Cooper Cutting. 1989. How to kick the bucket and not decompose:
Analyzability and idiom processing. Journal of Memory and Language 28: 576–593.
Gilardi, Luca & Collin Baker. 2018. Learning to align across languages: Toward
Multilingual FrameNet. In Proceedings of the International FrameNet workshop at
LREC 2018: Multilingual framenets and
constructicons, 13–22. Miyazaki: ELRA.
Hakulinen, Auli, Maria Vilkuna, Riitta Korhonen, Vesa Koivisto, Tarja Riitta Heinonen & Irja Alho. 2004. Iso suomen kielioppi [The big Finnish
grammar]. Online version at [URL], accessed on 2021-04-22. Helsinki: Suomalaisen Kirjallisuuden Seura.
Hammarström, Harald, Robert Forkel & Martin Haspelmath (eds.). 2020. Glottolog 4.3. Jena: Max Planck Institute for the Science of Human History.
. 2002. Mapping meaning onto use. In Marie-Hélène Corréard (ed.), Lexicography and natural language processing: A
Festschrift in honour of B.T.S. Atkins, 156–198. Grenoble: EURALEX.
Haspelmath, Martin. 2010. Comparative concepts and descriptive categories
in crosslinguistic studies. Language 86(3): 663–687.
. 2011a. The European linguistic area: Standard Average
European. In Martin Haspelmath, Ekkehard König, Wulf Oesterreicher & Wolfgang Raible (eds.), Language typology and language universals: An
international handbook. Vol. 2, 1492–1510. Berlin: Walter De Gruyter.
. 2011b. The indeterminacy of word segmentation and the
nature of morphology and syntax. Folia Linguistica 45(1): 31–80.
. 2015. The serial verb construction: Comparative concept
and cross-linguistic generalizations. Language and Linguistics 17(3): 291–319.
Haspelmath, Martin & Uri Tadmor (eds.). 2009. Loanwords in the world’s languages: A comparative
handbook. Berlin: Mouton de Gruyter.
Hewlett, Daniel & Paul Cohen. 2011. Fully unsupervised word segmentation with BVE and
MDL. In Proceedings of ACL-HLT 2011, 540–545. Portland: ACL.
Hoffmann, Thomas & Graeme Trousdale (eds.). 2013. The Oxford handbook of construction grammar. Oxford: Oxford University Press.
Jensen, John T. 1990. Morphology: Word structure in generative
grammar. Amsterdam: John Benjamins.
Koptjevskaja-Tamm, Maria, Martine Vanhove & Peter Koch. 2007. Typological approaches to lexical
semantics. Linguistic Typology 11: 159–185.
Lass, Roger. 1978. Mapping constraints in phonological
reconstruction: On climbing down trees without falling out
of them. In Jacek Fisiak (ed.), Recent developments in historical phonology, 245–286. Berlin: De Gruyter.
Lieber, Rochelle & Pavol Štekauer (eds.). 2009. The Oxford handbook of compounding. Oxford: Oxford University Press.
Lyngfelt, Benjamin, Linnéa Bäckström, Lars Borin, Anna Ehrlemark & Rudolf Rydstedt. 2018. Constructicography at work: Theory meets practice
in the Swedish constructicon. In Benjamin Lyngfelt, Lars Borin, Kyoko Ohara & Tiago Timponi Torrent (eds.), Constructicography: Constructicon development across
languages, 41–106. Amsterdam: John Benjamins.
Macdonell, Arthur A. 1893. A Sanskrit-English dictionary: Being a practical
handbook with transliteration, accentuation, and
etymological analysis throughout. London: Longmans, Green.
Maggi, Mauro & Paola Orsatti. 2018. From Old to New Persian. In Mauro Maggi & Paola Orsatti (eds.), The Oxford handbook of Persian linguistics, 7–51. Oxford: Oxford University Press.
Majid, Asifa, Fiona Jordan & Michael Dunn. 2015. Semantic systems in closely related
languages. Language Sciences 49: 1–18.
Markantonatou, Stella, Carlos Ramisch, Agata Savary & Veronika Vincze (eds.). 2018. Multiword expressions at length and in depth: Extended
papers from the MWE 2017 workshop. Berlin: Language Science Press.
Matras, Yaron. 2007. The borrowability of structural
categories. In Yaron Matras & Jeanette Sakel (eds.), Grammatical borrowing in cross-linguistic
perspective, 31–73. Berlin: Walter de Gruyter.
. 2009. Polysynthesis in the Arctic. In Marc-Antoine Mahieu & Nicole Tersis (eds.), Variations on polysynthesis: The Eskimo-Aleut
languages, 3–18. Amsterdam: John Benjamins.
Mohammad, Jan & Simin Karimi. 1992. ‘Light’ verbs are taking over: Complex verbs in
Persian. In Proceedings of WECOL 1992, 195–212. Fresno: Dept. of Linguistics, California State University, Fresno.
Monier-Williams, Monier. 1899. A Sanskrit-English dictionary: Etymologically and
philologically arranged with special reference to cognate
Indo-European languages. Oxford: The Clarendon Press.
Munro, Pamela. 2005. From parts of speech to the
grammar. Studies in Language 30(2): 307–349.
Nasr, Alexis, Carlos Ramisch, José Deulofeu & André Valli. 2015. Joint dependency parsing and multiword expression
tokenization. In Proceedings of ACL/ IJCNLP 2015, 1116–1126. Beijing: ACL.
Nida, Eugene A. 1949. Morphology: The descriptive analysis of words. Ann Arbor: University of Michigan Press.
Nivre, Joakim & Jens Nilsson. 2004. Multiword units in syntactic
parsing. In Proceedings of MEMURA at LREC 2004, 39–46. Lisbon: ELRA.
Parmentier, Yannick & Jakub Waszczuk (eds.). 2019. Representation and parsing of multiword expressions:
Current trends. Berlin: Language Science Press.
Pawley, Andrew. 1993. A language which defies description by ordinary
means. In William A. Foley (ed.), The role of theory in language description, 87–129. Berlin: Mouton de Gruyter.
Pecina, Pavel. 2010. Lexical association measures and collocation
extraction. Language Resources and Evaluation 44(1–2): 137–158.
Piirainen, Elisabeth. 2012. Widespread idioms in Europe and beyond: Toward a lexicon
of common figurative units. New York: Peter Lang.
Polinsky, Maria. 2012. Headedness, again. In Thomas Graf, Denis Paperno, Anna Szabolcsi & Jos Tellings (eds.), Theories of everything. In honor of Ed Keenan (UCLA Working Papers in Linguistics), 348–359. Los Angeles: UCLA Department of Linguistics.
Polinsky, Maria & Lilla Magyar. 2020. Headedness and the lexicon: The case of
verb-to-noun ratios. Languages 5(1/9): 1–25.
Sag, Ivan, Timothy Baldwin, Francis Bond, Ann Copestake & Dan Flickinger. 2002. Multiword expressions: A pain in the neck for
NLP. In Alexander Gelbukh (ed.), Computational linguistics and intelligent text
processing: Third international conference:
Cicling-2002, 1–15. Berlin: Springer.
Sailer, Manfred & Stella Markantonatou (eds.). 2018. Multiword expressions: Insights from a multi-lingual
perspective. Berlin: Language Science Press.
Sansó, Andrea. 2011. Mediterranean languages. In Bernd Kortmann & Johan van der Auwera (eds.), The languages and linguistics of Europe: A comprehensive
guide, 341–356. Berlin: De Gruyter Mouton.
Savary, Agata, Marie Candito, Verginica Barbu Mititelu, Eduard Bejček, Fabienne Cap, Slavomír Čéplö, Silvio Ricardo Cordeiro, Gülşen Eryiğit, Voula Giouli, Maarten van Gompel, Yaakov HaCohen-Kerner, Jolanta Kovalevskaitė, Simon Krek, Chaya Liebeskind, Johanna Monti, Carla Parra Escartín, Lonneke van der Plas, Behrang QasemiZadeh, Carlos Ramisch, Federico Sangati, Ivelina Stoyanova & Veronika Vincze. 2018. PARSEME multilingual corpus of verbal multiword
expressions. In Stella Markantonatou, Carlos Ramisch, Agata Savary & Veronika Vincze (eds.), Multiword expressions at length and in depth: Extended
papers from the MWE 2017 workshop, 87–147. Berlin: Language Science Press.
Savary, Agata, Silvio Ricardo Cordeiro, Timm Lichte, Carlos Ramisch, Uxoa Iñurrieta & Voula Giouli. 2019. Literal occurrences of multiword expressions:
Rare birds that cause a stir. The Prague Bulletin of Mathematical Linguistics (112): 5–54.
Schulte im Walde, Sabine & Eva Smolka (eds.). 2020. The role of constituents in multiword expressions: An
interdisciplinary, cross-lingual perspective. Berlin: Language Science Press.
Schultze-Berndt, Eva. 2006. Taking a closer look at function verbs: Lexicon,
grammar, or both? In Felix K. Ameka, Alan Dench & Nicholas Evans (eds.), Catching language: The standing challenge of grammar
writing, 359–391. Berlin: Mouton de Gruyter.
Silveira, Natalia & Christopher D. Manning. 2015. Does Universal Dependencies need a parsing
representation? An investigation of English. In Proceedings of Depling 2015, 310–319. Uppsala: ACL.
Skorik, Pëtr J. 1961. Grammatika čukotskogo jazyka: Čast’ pervaja [Chukchi grammar: Part I]. Moscow: Izdatel’stvo Akademii Nauk SSSR.
Taljard, Elsabé & Sonja E. Bosch. 2006. A comparison of approaches to word class tagging:
Disjunctively vs. conjunctively written Bantu
languages. Nordic Journal of African Studies 15(4): 428–442.
Tanaka, Takaaki & Timothy Baldwin. 2003. Noun-noun compound machine translation: A
feasibility study on shallow processing. In Proceedings of MWE 2003, 17–24. Sapporo: ACL.
Teleman, Ulf, Staffan Hellberg & Erik Andersson. 1999. Svenska Akademiens grammatik [The Swedish Academy
grammar]. Stockholm: Norstedts.
van der Auwera, Johan. 2012. From contrastive linguistics to linguistic
typology. Languages in Contrast 12(1): 69–86.
