In:Computational Phraseology
Edited by Gloria Corpas Pastor and Jean-Pierre Colson
[IVITRA Research in Linguistics and Literature 24] 2020
► pp. 225–246
Empirical variability of Italian multiword expressions as a useful feature for their categorisation
Published online: 8 May 2020
https://doi.org/10.1075/ivitra.24.12squ
https://doi.org/10.1075/ivitra.24.12squ
Abstract
In contemporary linguistics the definition of those entities
which are referred to as multiword expressions (MWEs) remains controversial.
It is intuitively clear that some words, when appearing together, have some
“special bond” in terms of meaning (e.g. black hole, mountain chain), or
lexical choice (e.g. strong tea, to fill a form), contrary to free
combinations. Nevertheless, the great variety of features and anomalous
behaviours that these expressions exhibit makes it difficult to organise
them into categories and gives rise to a great amount of different and
sometimes overlapping terminology.
So far, most approaches in corpus linguistics have focused on
trying to automatically extract MWEs from corpora by using statistical
association measures, while theoretical aspects related to their definition,
typology and behaviours arising from quantitative corpus-based studies have
not been widely explored, especially for languages with a rich morphology
and relatively free word order, such as Italian.
This contribution attests that a systematic analysis of the
empirical behaviour of Italian MWEs in large corpora, with respect to
several parameters, such as syntactic and lexical variations, is useful for
outlining a categorisation of the expressions in homogeneous sets which
approximately correspond to what is intuitively known as multiword units
(“polirematiche” in the Italian lexicographic tradition) and lexical
collocations. The importance of this kind of approach is that the resulting
categorisation of MWEs is grounded on empirical data rather than relying on
intuitive and not-always-coherent linguistic definitions.
The variational features taken into account are (1) the
possibility for the expressions to be syntactically transformed, and (2) the
possibility for one of the component to be replaced with a synonym. These
features can be automatically and quantitatively investigated using
ad hoc designed tools, whose methodology is fully
explained, if an annotated corpus and a list of expressions are provided. It
is possible to show that the kind of attested variations and the magnitude
of variation appear highly correlated to the grammatical structure of a
given phrase, indicating that the bond between the components for a
multiword unit or a lexical collocation can be formed by activating
different kinds of restrictions, depending on the considered grammatical
pattern.
Article outline
- 1.Introduction
- 2.Anomalous behaviours of Italian Multiword Expressions
- 3.A quantitative approach to MWEs
- 3.1Reasons to go beyond statistics
- 3.2Reasons for an empirical, quantitative approach to MWEs
- 4.Methodology
- 4.1Syntactic variations
- 4.2Lexical variations
- 4.3Inflectional variations
- 5.Analysis and results
- 6.Conclusion
Notes Bibliography
References (34)
Baldwin, T., Colin B., Takaaki, T., & Widdows, D. (2003). An Empirical Model of Multiword Expression Decomposability. In Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (pp. 89–96).
Bannard, C., Timothy, B., & Lascarides, A. (2003). A Statistical Approach to the Semantics of Verb-Particles. In Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (pp. 65–72).
Bartsch, S. (2004). Structural and Functional Properties of Collocations in English. Tübingen: Narr.
Burger, H. (1998). Phraseologie: Eine Einfūhrung am Beispiel des Deutschen. Berlin: Erich Schmidt Verlag.
Calzolari, N., Filmore, C., Grisham, R., Ide, N., Lenci, A., MacLeod, C., & Zampolli, A. (2002). Towards Best Practice for Multiword Expressions in Computational Lexicons. In Proceedings of the 3rd International Conference on Language Resources and Evaluation.
Casadei, F. (1996). Metafore ed espressioni idiomatiche. Uno studio semántico sull’italiano. Roma: Bulzoni.
De Mauro, T., & Voghera, M. (1996). Scala mobile. Un punto di vista sui lessemi complessi. In P. Benincà, G. Cinque, T. De Mauro, & N. Vincent (Eds.), Italiano e i dialetti nel tempo (pp. 99–131). Roma: Bulzoni.
Dunning, T. (1993). Accurate Methods for the Statistics of Surprise and Coincidence. Computational Linguistics, 19(1), 61–74.
Evert, S. (2004). The Statistics of Word Cooccurrences: Word Pairs and Collocations. (PhD Thesis, Institut für maschinelle Sprachverarbeitung, Universität Stuttgart).
(2008). Corpora and collocations. In A. Lüdeling, & M. Kyto (Eds.), Corpus Linguistics. An International Handbook (pp. 1212–1248). Berlin: Mouton De Gruyter.
Fazly, A., & Stevenson, S. (2007). Distinguishing Subtypes of Multiword Expressions Using Linguistically-Motivated Statistical Measures. In Proceedings of the Workshop on A Broader Perspective on Multiword Expressions. Jun 2017. Prague, Czech Republic. ACL (Association for Computational Linguistics). 9-16. (Retrieved from: [URL]).
Lin, D. (1999). Automatic Identification of non-compositional phrases. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics (pp. 317–324).
Lyding, V., Stemle, E., Borghetti, C., Brunello, M., Castagnoli, S., Dell’Orletta, F., Dittmann, H., Lenci, A., & Pirrelli, V. (2014). The PAISÀ Corpus of Italian Web Texts. In Proceedings of the 9th Web as a Corpus Workshop of the Association for Computational Linguistics (pp. 36–43).
Masini, F. (2007). Parole sintagmatiche in italiano. (PhD Thesis, Università degli Studi di Roma Tre, Roma).
(2008). Binomi coordinati in italiano. In E. Cresti (Ed.), Prospettive nello studio del lessico italiano (pp. 563–571). Firenze: Firenze University Press.
McCarthy, D., Keller, B., & Carroll, J. (2003). Detecting a Continuum of Compositionality in Phrasal Verbs. In Proceedings of the ACL 2003 Workshop on Multiword Expressions: Analysis, Acquisition and Treatment (pp. 73–80).
Miller, G. A., & Walter, G. C. (1991). Contextual Correlates of Semantic Similarity. Language and Cognitive Processes, 6(1), 1–28.
Ramat, P. (1990). Definizione di parola e sua tipologia. In M. Berretta, P. Molinelli & A. Valentini (Eds.), Parallela 4. Morfologia. Tübingen: Gunter Narr.
Squillante, L. (2015). Polirematiche e Collocazioni dell’ Italiano. Uno Studio Linguistico e Computazionale. (PhD Thesis, Sapienza- Università di Roma).
Tiberii, P. (2012). Dizionario delle Collocazioni. Le combinazioni delle parole in italiano. Bologna: Zanichelli.
Voghera, M. (1994). Lessemi complessi: percorsi di lessicalizzazione a confronto. Lingua e Stile, XXIX(2), 185–214.
(2004). Polirematiche. In M. Grossmann, & F. Rainer (Eds.), La formazione delle parole in italiano (pp. 56–69). Tübingen: Niemeyer.
