Article published In: Diachronica
Vol. 40:2 (2023) ► pp.153–194
Tracing semantic change with distributional methods
The contexts of algo
Published online: 8 November 2022
https://doi.org/10.1075/dia.21012.ama
https://doi.org/10.1075/dia.21012.ama
Abstract
This paper uses the tools of distributional semantics to investigate the semantic change of algo from a noun meaning ‘goods, possessions’ and an indefinite pronoun ‘something’ in the Medieval/Classical period of Spanish to an indefinite pronoun and degree adverb ‘a bit’ in contemporary Spanish. We compare the results of a previous corpus-based study (Amaral, Patrícia. 2016. When something becomes a bit. Diachronica 33(2). 151–186.) on the semantic change of algo with an analysis using word embeddings models with two goals: (i) to show how word embeddings can help identify different synchronic values of a word, and (ii) to provide measures of change through distributional semantic methods. We discuss the challenges of a study with this methodology using limited data from older periods of a language, hence putting into focus decisions that have to be made and their implications for the analysis. In this way, we hope to contribute to a fruitful integration of more traditional studies in diachronic semantics with the methodology of word embeddings.
Keywords: semantic change, word embeddings, corpora, Spanish
Résumé
Cet article porte sur l’étude du changement sémantique de algo, à l’origine un nom signifiant ‘biens, avoirs’ devenu pronom indéfini, ‘quelque chose’, pendant les périodes médiévale et classique, jusqu’à son fonctionnement comme pronom indéfini et adverbe de degré, ‘un peu’, en espagnol contemporain, en utilisant les outils de la sémantique distributionnelle. Nous comparons les résultats d’une étude antérieure sur ce changement sémantique (Amaral, Patrícia. 2016. When something becomes a bit. Diachronica 33(2). 151–186.) avec notre analyse construite via des modèles de plongements lexicaux. Nous avons deux objectifs: (i) démontrer que ces modèles nous permettent d’identifier les différentes valeurs synchroniques d’un mot, et (ii) donner des mesures de changement sémantique en utilisant les méthodes de la semantique distributionnelle. Nous parlons des difficultés d’une telle étude avec ces méthodes quand il n’y a que des données en quantité limitée sur les états de langue plus anciens, mettant ainsi en évidence les décisions à prendre et leurs implications pour notre analyse. Ce faisant, nous enrichissons dans une certaine mesure des travaux en sémantique diachronique plus traditionnels par l’emploi d’une methodologie reposant sur les plongements lexicaux.
Zusammenfassung
Dieser Artikel untersucht mittels Methoden der distributioneilen Semantik den semantischen Sprachwandel von algo vom Nomen (‘Gut, Besitz’) und Indefinitpronomen (‘etwas’) in der mittelalterlichen und klassischen Periode zum Indefinitpronomen und Gradadverb (ein bisschen’) im modernen Spanisch. Wir vergleichen Ergebnisse einer vorhergehenden korpusbasierten Studie (Amaral, Patrícia. 2016. When something becomes a bit. Diachronica 33(2). 151–186.) und einer Analyse basierend auf Worteinbettungen mit zwei Zielen: (i) um den Beitrag von Worteinbettungen zur Identifizierung synchroner Bedeutungen eines Wortes herauszuarbeiten und (ii) um Metriken fur den Sprachwandel basierend auf Methoden der distributionellen Semantik vorzulegen. Wir diskutieren die Herausforderungen, die diese Methodologie bewältigen muss, wenn nur eine sehr beschränkte Menge an Daten der älteren Perioden einer Sprachen zur Verfügung stehen, d.h. wir zeigen die notwendigen Entscheidungen und deren Einflüsse auf die Ergebnisse auf. Wir hoffen, dadurch zu einer sinnvollen Integration traditioneller Studien zur diachronen Semantik und der Methodologie der Worteinbettungen beizutragen.
Article outline
- 1.Introduction
- 2.Background and rationale
- 2.1Previous work on semantic change
- 2.2Studying meaning change with distributional methods
- 2.3Previous research on algo
- 3.Corpora and processing of the data
- 3.1Modern Spanish: Spanish Billion Word Corpus (SBW)
- 3.2Medieval and Classical Spanish: Chronicles Corpus
- 3.3Processing of the data
- 4.Methodology
- 4.1Representing meaning in word embedding models
- 4.1.1Embeddings models used in this study
- Singular Value Decomposition
- Skip-Gram with Negative Sampling
- Global Vectors for Word Representation
- 4.1.2Using word embeddings to determine semantic neighbors
- 4.1.1Embeddings models used in this study
- 4.2Visualizations with t-SNE
- 4.1Representing meaning in word embedding models
- 5.Results
- 5.1Neighbors of algo in Chronicles and SBW
- 5.2Comparison between algo and two nouns, using the t-SNE visualization
- 6.Analysis of nearest neighbors
- 6.1Studying algo with word embeddings
- 6.2Comparison with previous work
- 7.Conclusion
- Notes
- Abbreviations
References
References (62)
Boleda, Gemma. 2020. Distributional semantics and linguistic theory. Annual Review of Linguistics 61. 13.1–13.22.
Boleda, Gemma & Aurélie Herbelot. 2016. Formal distributional semantics: Introduction to the special issue. Computational Linguistics 619–635.
Bybee, Joan, Revere Perkins & William Pagliuca. 1994. The evolution of grammar: Tense, aspect, and modality in the languages of the world. Chicago: University of Chicago Press.
Campbell, Lyle. 2013. Historical linguistics: An introduction. Cambridge, MA: The MIT Press 3rd edn.
Cardellino, Cristian. 2019. Spanish billion words corpus and embeddings. [URL]
Church, Kenneth Ward & Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics 16(1). 22–29. [URL]
Clark, Stephen. 2015. Vector space models of lexical meaning. In Shalom Lappin & Chris Fox (eds.), The handbook of contemporary semantic theory, 493–522. London: Wiley.
Condoravdi, Cleo & Ashwini Deo. 2014. Aspect shifts in Indo-Aryan and trajectories of semantic change. In Chiara Gianollo, Agnes Jäger & Doris Penka (eds.), Language change at the syntax-semantics interface, 261–292. Berlin: Mouton de Gruyter.
Cornillie, Bert. 2007. Evidentiality and epistemic modality in Spanish (semi-)auxiliaries: A cognitive-functional approach. Berlin: De Gruyter.
Corominas, Joan & José A. Pascual. 1980–1991. Diccionario crítico etimológico castellano e hispánico. Gredos.
Davies, Mark. 2001. Corpus del Español. [URL]
De Cesare, Anna-Maria. 2017. Introduction: On ‘additivity’ as a multidisciplinary research field. In Anna-Maria De Cesare & Cecilia Andorno (eds.), Focus on additivity, 1–22. John Benjamins.
Dubossarsky, Haim, Simon Hengchen, Nina Tahmasebi & Dominik Schlechtweg. 2019. Time-out: Temporal referencing for robust modeling of lexical semantic change. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 457–470. Florence, Italy. [URL].
Eberenz, Rolf. 1994. Enlaces conjuntivos y adjuntos de sentido aditivo del español preclásico: otrosí, eso mismo, asimismo, demás, también, aun, etc. Iberoromania 391. 1–20.
Eckardt, Regine. 2006. Meaning change in grammaticalization. An enquiry into semantic reanalysis. Oxford: Oxford University Press.
Espinoza Elorza, Rosa María. 2018. La formación de los marcadores sumativos en español. desde sobresto hasta a mayores. Estudios Humanísticos Filología 401. 95–118.
Fernández-Ordóñez, Ines. 2016. De más (demás), demasiado: la historia de dos cuantificadores contemplada desde la dialectología. In López Serena, A. and Narbona Jiménez, A. and del Rey Quesada, S. (ed.), El Español a través del tiempo. Estudios ofrecidos a Rafael Cano Aguilar, 477–496. Sevilla: Universidad de Sevilla.
Finkelstein, Lev, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman & Eytan Ruppin. 2001. Placing search in context: The concept revisited. In Proceedings of the 10th International World Wide Web Conference, 406–414.
Frermann, Lea & Mirella Lapata. 2016. A Bayesian model of diachronic meaning change. Transactions of the Association for Computational Linguistics 41. 31–45.
Gago Jover, Francisco (ed.). 2011. Spanish Chronicle Texts. Digital Library of Old Spanish Texts. Hispanic Seminary of Medieval Studies. [URL]
Gergel, Remus & Jonathan Watkins (eds.). 2020. Quantification and scales in change. Language Science Press.
GITHE, Universidad de Alcalá. 2015. Corpus de Documentos Españoles Anteriores a 1800. [URL]
Giulianelli, Mario, Marco Del Tredici & Raquel Fernández. 2020. Analysing lexical semantic change with contextualised word representations. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 3960–3973. Online. . [URL]
Golub, Gene H. & Christian Reinsch. 1971. Singular value decomposition and least squares solutions. In F. L. Bauer, A. S. Householder, F. W. J. Olver, H. Rutishauser, K. Samelson & E. Stiefel (eds.), Handbook for automatic computation, 134–151. Springer. Volume II1: Linear Algebra.
Hamilton, William L., Jure Leskovec & Dan Jurafsky. 2016. Diachronic word embeddings reveal statistical laws of semantic change. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, 1489–1501. Berlin, Germany. . [URL]
Hellrich, Johannes. 2019. Word embeddings: Reliability and semantic change: Jena University Language and Information Engineering Lab dissertation.
Hu, Hai, Patrícia Amaral & Sandra Kübler. 2021. Word embeddings and semantic shifts in historical Spanish: Methodological considerations. Digital Scholarship in the Humanities, 37,(2). 441–461.
Hu, Renfen, Shen Li & Shichen Liang. 2019. Diachronic sense modeling with deep contextualized word embeddings: An ecological view. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3899–3908. Florence, Italy. . [URL]
Jurafsky, Daniel & James H. Martin. 2019. Speech and Language Processing. 3rd edn. Online at [URL]; retrieved April 2020.
Keniston, Hayward. 1937. The syntax of Castilian prose: The sixteenth century. Chicago: The University of Chicago Press.
Kutuzov, Andrey, Lilja Øvrelid, Terrence Szymanski & Erik Velldal. 2018. Diachronic word embeddings and semantic shifts: A survey. In Proceedings of the 27th International Conference on Computational Linguistics, 1384–1397.
Landauer, Thomas, Peter Folz & Darrell Laham. 1998. An introduction to latent semantic analysis. Discourse Processing 25(2&3). 259–284.
Lenci, Alessandro. 2018. Distributional models of word meaning. Annual Review of Linguistics 4(1). 151–171.
Levy, Omer, Yoav Goldberg & Ido Dagan. 2015. Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics 31. 211–225.
Luo, Yiwei, Dan Jurafsky & Beth Levin. 2019. From insanely jealous to insanely delicious: Computational models for the semantic bleaching of English intensifiers. In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change, 1–13. Florence, Italy. [URL].
van der Maaten, Laurens & Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 91. 2579–2605.
Mikolov, Thomas, Kai Chen, Greg Corrado & Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In Proceedings of the International Conference on Learning Representations (ICLR).
Pennington, Jeffrey, Richard Socher & Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing EMNLP, 1532–1543. Doha, Qatar.
Poplack, Shana & Sali Tagliamonte. 2000. The grammaticization of going to in African American English. Language Variation and Change 111. 315–342.
Real Academia Española. n.a. Corpus Diacrónico del Español. [URL]
Rodda, Martina, Marco Senaldi & Alessandro Lenci. 2017. Panta Rei: Tracking semantic change with distributional semantics in Ancient Greek. Italian Journal of Computational Linguistics 3(1). 11–24.
Rodríguez Ramalle, Teresa María. 2001. Observaciones sobre el uso de los adjetivos y los adverbios en -mente con valor de grado en español. Español Actual 751. 33–43.
Rosenfeld, Alex & Katrin Erk. 2018. Deep neural models of semantic shift. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 474–484. New Orleans, Louisiana. . [URL]
Sagi, Eyal, Stefan Kaufmann & Brady Clark. 2012. Tracing semantic change with Latent Semantic Analysis. In Kathryn Allan & Justyna Robinson (eds.), Current methods in historical semantics, 161–183. Berlin: Mouton de Gruyter.
Sánchez López, Cristina. 1999. Los cuantificadores: Clases de cuantificadores y estructuras cuantificativas. In Ignacio Bosque & Violeta Demonte (eds.), Gramática descriptiva de la Lengua Española, vol. 11, cap.16. Espasa-Calpe.
Sánchez-Martínez, Felipe, Isabel Martínez-Sempere, Xavier Ivars-Ribes & Rafael C. Carrasco. 2013. An open diachronic corpus of historical Spanish. Language Resources and Evaluation 47(4). 1327–1342.
Sauerland, Uli & Penka Stateva. 2007. Scalar vs. epistemic vagueness: Evidence from approximators. In Proceedings of the 17th Semantics and Linguistic Theory conference (SALT), 228–245. University of Connecticut. [URL].
Stern, Gustaf. 1921. Swift, swiftly, and their synonyms. A contribution to semantic analysis and theory. Göteborg: Wettergren & Kerber.
Tang, Xuri. 2018. A state-of-the-art of semantic change computation. Natural Language Engineering 24(5). 649–676.
Torres Cacoullos, Rena. 2012. Grammaticalization through inherent variability. Studies in Language 36(1). 73–122.
