In:The Swedish FrameNet++: Harmonization, integration, method development and practical language technology applications
Edited by Dana Dannélls, Lars Borin and Karin Friberg Heppin
[Natural Language Processing 14] 2021
► pp. 97–122
Get fulltext
Chapter 4A lexical resource for computational historical
linguistics
Available under the Creative Commons Attribution-NonCommercial-NoDerivatives (CC BY-NC-ND) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 26 November 2021
https://doi.org/10.1075/nlp.14.04ade
https://doi.org/10.1075/nlp.14.04ade
Abstract
In this chapter we present the diachronic
dimension of Swedish FrameNet++. We describe the historical lexical
resources currently available for Swedish, linked to the
Contemporary Swedish lexicon Saldo. We present a case study of how
interlinking the dictionaries simultaneously allows us to study
lexical change. We also present a method of linking text words to
lexicon entries, facilitating interactive exploration of historical
texts. Diachronical language resources present both a high-variation
challenge from a wider language technology perspective, and an
interesting object of linguistic study. While a number of
improvements of the parts of the diachronic lexical macroresource
are still needed, this resource is invaluable for analysing and
accessing historical texts, as well as for both synchronic
historical and diachronic lexical studies.
Article outline
- 1.Introduction
- 2.A brief overview of Swedish language stages
- 3.Diachronical lexical resources
- 3.1Adding diachronical lexicons to SweFN++
- 3.2A lexical resource for Late Modern Swedish
- 3.3A lexical resource for Early Modern Swedish
- 3.4A lexical resource for Old Swedish
- 4.Diapivot
- 4.1Methods of automatically linking lexical resources
- 4.2An application: Studying lexical change and grammaticalization
- 5.Spelling variation and linking texts to lexicons
- 5.1A noisy channel approach to lemmatization
- 5.2Training a model on dictionary data
- 5.3Evaluation
- 5.4An application: FSvReader
- 6.Conclusions
Notes References Appendix
References (36)
Adesam, Yvonne, Malin Ahlberg, Peter Andersson, Lars Borin, Gerlof Bouma & Markus Forsberg. 2016. Språkteknologi för svenska språket
genom tiderna [Language technology for Swedish
across time]. In Studier i svensk språkhistoria 13: Historia och
språkhistoria, 65–87. Umeå University.
Adesam, Yvonne, Malin Ahlberg & Gerlof Bouma. 2012. Bokstaffua, bokstaffwa, bokstafwa, bokstaua,
bokstawa … Towards lexical link-up for a corpus of Old
Swedish. In Proceedings of KONVENS 2012 (LThist 2012
workshop), 365–369. Vienna: ÖGAI.
. 2018. FSvReader – Exploring Old Swedish cultural
heritage texts. In Proceedings of DHN 2018, 209–218. Aachen: CEUR-WS.org.
Adesam, Yvonne & Gerlof Bouma. 2016. Old Swedish part-of-speech tagging between
variation and external knowledge. In Proceedings of LaTeCH 2016. Berlin: ACL.
Ahlberg, Malin & Peter Andersson. 2013. Towards automatic tracking of lexical change:
Linking historical lexical resources. In Proceedings of the Workshop on computational historical
linguistics at Nodalida 2013, 1–10. Linköping: LiUEP.
Andersson, Peter. 2007. Modalitet och förändring: en studie
av må och kunna i fornsvenska [Modality and change: a study of Old
Swedish må and kunna]. Gothenburg: University of Gothenburg. (PhD thesis).
. 2014. The fast case: Constructionalization of a Swedish
concessive. Nordic Journal of Linguistics 37(2): 141–167.
Baron, Alistair. 2011. Dealing with spelling variation in Early Modern
English texts. Lancaster University. (PhD thesis).
Barteld, Fabian, Chris Biemann & Heike Zinsmeister. 2019. Token-based spelling variant detection in Middle
Low German texts. Language Resources and Evaluation 53: 677–706.
Bollmann, Marcel, Florian Petran & Stefanie Dipper. 2014. Applying rule-based normalization to different
types of historical texts: An evaluation. In Zygmunt Vetulani & Joseph Mariani (eds.), Human language technology challenges for computer
science and linguistics, 166–177. Cham: Springer.
Borin, Lars & Markus Forsberg. 2011. A diachronic computational lexical resource for
800 years of Swedish. In Caroline Sporleder, Antal van den Bosch & Kalliopi Zervanou (eds.), Language technology for cultural heritage, 41–61. Berlin: Springer.
Brill, Eric & Robert C. Moore. 2000. An improved error model for noisy channel
spelling correction. In Proceedings of ACL 2000, 286–293. Hong Kong: ACL.
Budassi, Marco & Marco Passarotti. 2016. Nomen Omen: Enhancing the Latin
morphological analyser Lemlat with an
onomasticon. In Proceedings of LaTeCH 2016, 90–94. Berlin: ACL.
Dahlgren, Fredrik August. 1914. Glossarium öfver föråldrade eller ovanliga
ord och talesätt i svenska språket: från och med 1500-talets
andra årtionde [Glossary of obsolete or uncommon
words and locutions in Swedish: from 1510]. Lund: Gleerup.
Dalin, Anders Fredrik. 1850–1853. Ordbok öfver svenska språket [Swedish dictionary]. Vol. I–II. Stockholm: Joh. Beckman.
Djärv, Ulrika. 2009. Fornsvenskans lexikala kodifiering i
Söderwalls medeltidsordbok [The lexical encoding of Old Swedish
in Söderwalls medieval dictionary] (Samlingar utgivna av Svenska fornskriftsällskapet. Serie
1. Svenska skrifter 91). Uppsala: Svenska fornskriftsällskapet.
Eckhoff, Hanne, Kristin Bech, Gerlof Bouma, Kristine Eide, Dag Haug, Odd Einar Haugen & Marius Jøhndal. 2018. The PROIEL treebank family: a standard for early
attestations of Indo-European languages. Language Resources and Evaluation 52: 29–65.
Enberg, Lars Magnus. 1836. Svensk språklära utgifven av Svenska
Akademien [Swedish grammar published by the
Swedish Academy]. Stockholm: A.G. Hellsten.
Hopper, Paul J. & Elizabeth Closs Traugott. 2003. Grammaticalization. 2nd edn. Cambridge: Cambridge University Press.
Jurish, Bryan. 2010. More than words: Using token context to improve
canonicalization of historical German. Journal for Language Technology and Computational
Linguistics 25(1): 23–40.
Koplenig, Alexander. 2017. A data-driven method to identify (correlated)
changes in chronological corpora. Journal of Quantitative Linguistics 24(4): 289–318.
Ljunggren, Karl Gustav. 1939. Adjektivering av substantiv i svenskan:
undersökningar i svensk ordbildningsoch
betydelselära [Adjectivization of nouns in Swedish:
studies in Swedish word formation and
semantics]. Lund: Gleerup.
Oncina, Jose & Marc Sebban. 2006. Learning stochastic edit distance: Application in
handwritten character recognition. Pattern Recognition 39(9): 1575–1587.
Pettersson, Eva. 2016. Spelling normalisation and linguistic analysis of
historical text for information extraction. Uppsala University. (PhD thesis).
Pettersson, Gertrud. 1996. Svenska språket under sjuhundra
år [Seven hundred years of
Swedish]. Lund: Studentlitteratur.
Ralph, Bo. 1984. Fornsvenska [Old Swedish] (number 17 in Kompendier utarbetade vi institutionen för
nordiska språk, Göteborgs universitet). Research report. Gothenburg: University of Gothenburg, Dept. of Scandinavian Languages.
Ramiro, Christian, Mahesh Srinivasan, Barbara C. Malt & Yang Xu. 2018. Algorithms in the historical emergence of word
senses. PNAS 115(10): 2323–2328.
Sahlstedt, Abraham. 1757. Svensk ordbok, efter det nu för tiden i
tal och skrifter brukliga sättet inrättad [Swedish dictionary, set up after the
currently in speech and writing customary way]. Stockholm: Lars Salvius.
. 1773. Swensk ordbok med latinsk
uttolkning [Swedish dictionary with Latin
interpretation]. Stockholm: Carl Stolpe.
Schlyter, Carl J. 1887. Ordbok till samlingen af Sweriges gamla
lagar [Dictionary accompanying the
collection of Sweden’s ancient laws]. Lund: Berling.
Söderwall, Knut Fredrik. 1884–1918. Ordbok öfver svenska
medeltids-språket [Dictionary of medieval
Swedish]. Vol I–III. Lund: Svenska fornskriftsällskapet.
Söderwall, Knut Fredrik, Walter Åkerlund, Karl Gustav Ljunggren & Elias Wessén. 1925–1973. Ordbok öfver svenska
medeltids-språket [Dictionary of medieval
Swedish]. Supplement. Vol IV–V. Lund: Svenska fornskriftsällskapet.
Swedberg, Jesper & Lars Holm. 2009. Swensk ordabok. Utgiven efter
Uppsala-handskriften, med tillägg och rättelser ur övriga
handskrifter, av Lars Holm [Swedish dictionary. Published on the
basis of the Uppsala manuscript, with additions and
corrections from other manuscripts, by Lars
Holm]. Skara: Stiftsoch landsbiblioteket i Skara.
