In:The Swedish FrameNet++: Harmonization, integration, method development and practical language technology applications
Edited by Dana Dannélls, Lars Borin and Karin Friberg Heppin
[Natural Language Processing 14] 2021
► pp. 139–166
Get fulltext
Chapter 6Swedish FrameNet++ and comparative linguistics
Available under the Creative Commons Attribution-NonCommercial-NoDerivatives (CC BY-NC-ND) 4.0 license.
For any use beyond this license, please contact the publisher at rights@benjamins.nl.
Published online: 26 November 2021
https://doi.org/10.1075/nlp.14.06bor
https://doi.org/10.1075/nlp.14.06bor
Abstract
In this chapter we describe a multilingual
extension of Swedish FrameNet++, intended to address research
questions of a broad comparative nature, in genealogical, areal and
typological linguistics, focusing on the integration into Swedish
FrameNet++ of so-called core vocabularies, used in several
linguistic subfields in order to conduct massive comparative studies
involving large numbers of languages. Specifically, we describe the
inclusion of two such lexical databases covering several hundred
South Asian languages, with the aim of investigating areal and
genealogical connections among these languages.
Article outline
- 1.The multilingual aspects of Swedish FrameNet++
- 2.Core vocabularies for comparative linguistic studies
- 2.1Basic vocabularies in linguistics
- 2.1.1Semantic simplicity
- 2.1.2Early acquisition/commonness/representativeness/frequency
- 2.1.3Resistance to replacement
- 2.1.4Related kinds of vocabularies
- 2.2The composition and size of core vocabularies
- 2.2.1The “words” of core vocabularies
- 2.2.2Selecting core vocabulary concepts
- 2.2.3Selecting core-vocabulary lexemes
- 2.2.4Comparing core vocabularies
- 2.2.5Conclusions
- 2.1Basic vocabularies in linguistics
- 3.Two lexical databases for investigation of South Asian linguistic
diversity and unity
- 3.1Linguistic diversity in South Asia
- 3.2Grierson’s comparative vocabulary in Swedish
FrameNet++
- 3.2.1The Linguistic survey of India
- 3.2.2The LSI comparative vocabulary
- 3.3The Intercontinental Dictionary Series as a
comparative linguistic research tool
- 3.3.1A lexical basis for investigating the linguistic landscape of the indian Himalayas
- 3.3.2The IDS – not just another Swadesh list
- 4.Conclusion and future prospects
Notes References
References (79)
Ahlberg, Malin, Lars Borin, Markus Forsberg, Olof Olsson, Anne Schumacher & Jonatan Uppström. 2016. Karp: Språkbanken’s open lexical
infrastructure. In Globalex 2016 book of abstracts.
Baroni, Marco & Silvia Bernardini (eds.). 2006. Wacky! Working papers on the Web as Corpus. Online version: [URL]. Bologna: GEDIT.
Berlin, Brent & Paul Kay. 1969. Basic color terms: Their universality and
evolution. Berkeley: University of California Press.
Borin, Lars. 2012. Core vocabulary: A useful but mystical concept in
some kinds of linguistics. In Diana Santos, Krister Lindén & Wanjiku Ng’ang’a (eds.), Shall we play the Festschrift game? Essays on the
occasion of Lauri Carlson’s 60th birthday, 53–65. Berlin: Springer.
Borin, Lars, Bernard Comrie & Anju Saxena. 2013. The Intercontinental Dictionary Series – a rich
and principled database for language
comparison. In Lars Borin & Anju Saxena (eds.), Approaches to measuring linguistic differences, 285–302. Berlin: De Gruyter Mouton.
Borin, Lars, Markus Forsberg, Leif-Jöran Olsson & Jonatan Uppström. 2012. The open lexical infrastructure of
Språkbanken. In Proceedings of LREC 2012, 3598–3602. Istanbul: ELRA.
Borin, Lars, Markus Forsberg & Johan Roxendal. 2012. Korp – the corpus infrastructure of
Språkbanken. In Proceedings of LREC 2012, 474–478. Istanbul: ELRA.
Brysbaert, Marc & Boris New. 2009. Moving beyond Kučera and Francis: A critical
evaluation of current word frequency norms and the
introduction of a new and improved word frequency measure
for American English. Behavior Research Methods 41(4): 977–990.
. 1949. A dictionary of selected synonyms in the principal
IndoEuropean languages. Chicago: University of Chicago Press.
Campbell, Justice (Sir George). 1866a. Appendix A: List of words and phrases to be noted
and used as test words for the discovery of the radical
affinities of languages, and for easy
comparison. In Journal of the Asiatic Society. Special number:
Ethnology, vol. XXXV, p. II, 201–203. Calcutta: Asiatic Society of Bengal.
. 1866b. The ethnology of India. In Journal of the Asiatic Society. Special number:
Ethnology, vol. XXXV, p. II, 1–152. Calcutta: Asiatic Society of Bengal.
Chiarcos, Christian, Sebastian Nordhoff & Sebastian Hellman (eds.). 2012. Linked data in linguistics: Representing and connecting
language data and language metadata. Berlin: Springer.
COE. 2012. Common European Framework of Reference for languages:
Learning, teaching, assessment (CEFR). Strasbourg: Council of Europe.
Dockum, Rikker & Claire Bowern. 2018. Swadesh lists are not long enough: Drawing
phonological generalizations from limited
data. In Peter K. Austin (ed.), Language documentation and description, vol 16, 35–54. London: EL Publishing.
Dolgopolsky, Aharon B. 1986. A probabilistic hypothesis concerning the oldest
relationships among the language families in northern
Eurasia. In Vitalij V. Shevoroshkin & Thomas L. Markey (eds.), Typology, relationship and time: A collection of papers
on language change and relationship by Soviet
linguists, 27–50. Ann Arbor: Karoma.
Dryer, Matthew S. & Martin Haspelmath (eds.). 2013. The world atlas of language structures online. Jena: Max Planck Institute for the Science of Human History.
Eberhard, David M., Gary F. Simons & Charles D. Fennig (eds.). 2021. Ethnologue: Languages of the world. 24th edn. Dallas: SIL International.
Ebert, Karen. 2006. South Asia as a linguistic area. In Keith Brown (ed.), Encyclopedia of languages and linguistics, 2nd edn. Oxford: Elsevier.
Evans, Nicholas. 2011. Semantic typology. In Jae Jung Song (ed.), The Oxford handbook of linguistic typology, 504–533. Oxford: Oxford University Press.
Evans, Nicholas & Stephen C. Levinson. 2009. The myth of language universals: Language
diversity and its importance for cognitive
science. Behavioral and Brain Sciences 32: 429–492.
Everaert, Martin, Simon Musgrave & Alexis Dimitriadis (eds.). 2009. The use of databases in cross-linguistic
studies. Berlin: Mouton de Gruyter.
Fellbaum, Christiane. 1998a. Introduction. In Christiane Fellbaum (ed.), WordNet: An electronic lexical database, 1–19. Cambridge, Mass.: MIT Press.
Forsbom, Eva. 2006. A Swedish base vocabulary pool. In Proceedings of SLTC 2006. [URL]. Gothenburg: University of Gothenburg.
Georg, Stefan. 2017. Other isolated languages of Asia. In Lyle Campbell (ed.), Language isolates, 139–161. London: Routledge.
(ed.). 2008. Cross-linguistic semantics. Amsterdam: John Benjamins.
. 2012. Semantic primes, semantic molecules, semantic
templates: Key concepts in the NSM approach to lexical
typology. Linguistics 50(3): 711–743.
Greenhill, Simon J., Robert Blust & Russell D. Gray. 2008. The Austronesian basic vocabulary database: From
bioinformatics to lexomics. Evolutionary Bioinformatics 2008(4): 271–283.
Grierson, George A. 1903–1927. A linguistic survey of India. Vol. I–XI. Calcutta: Government of India, Central Publication Branch.
Gustafson-Capková, Sofia & Britt Hartmann. 2006. Manual of the Stockholm Umeå Corpus version 2.0. Stockholm University, Dept. of Linguistics.
Hammarström, Harald, Robert Forkel & Martin Haspelmath (eds.). 2020. Glottolog 4.3. Jena: Max Planck Institute for the Science of Human History.
Haspelmath, Martin. 2011. The indeterminacy of word segmentation and the
nature of morphology and syntax. Folia Linguistica 45(1): 31–80.
Haspelmath, Martin & Uri Tadmor (eds.). 2009. Loanwords in the world’s languages: A comparative
handbook. Berlin: Mouton de Gruyter.
Heine, Bernd & Tania Kuteva. 2005. Language contact and grammatical change. Cambridge: Cambridge University Press.
Holman, Eric W., Søren Wichmann, Cecil H. Brown, Viveka Velupillai, André Müller & Dik Bakker. 2008. Explorations in automated language
classification. Folia Linguistica 42(2): 331–354.
Huang, Chu-ren, Nicoletta Calzolari, Aldo Gangemi, Alessandro Lenci, Alessandro Oltramari & Laurent Prevot (eds.). 2010. Ontology and the lexicon: A natural language processing
perspective. Cambridge: Cambridge University Press.
Kay, Paul & Chad K. McDaniel. 1978. The linguistic significance of the meanings of
basic color terms. Language 54(3): 610–646.
Kilgarriff, Adam, Frieda Charalabopoulou, Maria Gavrilidou, Janne Bondi Johan nessen, Saussan Khalil, Sofie Johansson Kokkinakis, Robert Lew, Serge Sharoff, Ravikiran Vadlapudi & Elena Volodina. 2014. Corpus-based vocabulary lists for language
learners for nine languages. Language Resources and Evaluation 48: 121–163.
Kuhn, Tobias. 2014. A survey and classification of controlled natural
languages. Computational Linguistics 40(1): 121–170.
. 2016. The controlled natural language of Randall
Munroe’s Thing explainer. In Proceedings of CNL 2016, 102–110. Cham: Springer.
Levinson, Stephen C. 2003. Language and mind: Let’s get the issues
straight! In Dedre Gentner & Susan Goldin-Meadow (eds.), Language in mind: Advances in the study of language and
thought, 25–46. Cambridge, Mass.: MIT Press.
Lewis, William D. & Fei Xia. 2010. Developing ODIN: A multilingual repository of
annotated language data for hundreds of the world’s
languages. Literary and Linguistic Computing 25(3): 303–319.
Lyngfelt, Benjamin, Lars Borin, Kyoko Ohara & Tiago Timponi Torrent (eds.). 2018. Constructicography: Constructicon development across
languages. Amsterdam: John Benjamins.
Malm, Per, Shafqat Virk, Lars Borin & Anju Saxena. 2018. LingFN: Towards a framenet for the linguistics
domain. In Proceedings of the International FrameNet workshop at
LREC 2018: Multilingual framenets and
constructicons, 37–43. Miyazaki: ELRA.
Manning, Christopher D., Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bethard & David McClosky. 2014. The Stanford CoreNLP natural language processing
toolkit. In Proceedings of ACL 2014, 55–60. Baltimore: ACL.
Masica, Colin P. 1976. Defining a linguistic area: South Asia. Chicago: University of Chicago Press.
Matras, Yaron & Jeanette Sakel (eds.). 2007. Grammatical borrowing in cross-linguistic
perspective. Berlin: Walter de Gruyter.
Milton, James. 2009. Measuring second language vocabulary
acquisition. Bristol: Multilingual Matters.
Nichols, Johanna. 2003. Diversity and stability in
language. In Brian D. Joseph & Richard D. Janda (eds.), The handbook of historical linguistics, 283–310. Oxford: Blackwell.
Ogden, Charles K. 1930. Basic English: A general introduction with rules and
grammar. London: Paul Treber & Co., Ltd.
Oswalt, Robert L. 1971. Towards the construction of a standard
lexicostatistic list. Anthropological Linguistics 13(9): 421–434.
Saxena, Anju. Forthcoming. The linguistic landscape of the Indian Himalayas:
Languages in Kinnaur. Leiden: Brill.
Saxena, Anju & Lars Borin. 2011. Dialect classification in the Himalayas: A
computational approach. In Proceedings of Nodalida 2011, 307–310. Riga: NEALT.
. 2013. Carving Tibeto-Kanauri by its joints: Using basic
vocabulary lists for genetic grouping of
languages. In Lars Borin & Anju Saxena (eds.), Approaches to measuring linguistic differences, 175–198. Berlin: De Gruyter Mouton.
Slaska, Natalia. 2005. Lexicostatistics away from the armchair: Handling
people, props and problems. Transactions of the Philological Society 103(2): 221–242.
Swadesh, Morris. 1948. The time value of linguistic
diversity. In Paper presented at the Viking Fund supper conference for
anthropologists. Abstract in part: Swadesh 1952: 454.
. 1952. Lexico-statistic dating of prehistoric ethnic
contacts: With special reference to North American Indians
and Eskimos. Proceedings of the American Philosophical
Society 96(4): 452–463.
. 1955. Towards greater accuracy in lexicostatistic
dating. International Journal of American Linguistics 21: 121–137.
Tadmor, Uri. 2009. Loanwords in the world’s languages: Findings and
results. In Martin Haspelmath & Uri Tadmor (eds.), Loanwords in the world’s languages: A comparative
handbook, 55–75. Berlin: Mouton de Gruyter.
Thomason, Sarah Grey. 2000. Linguistic areas and language
history. In Dicky G. Gilbers, John Nerbonne & Jos Schaeken (eds.), Languages in contact, 311–327. Amsterdam: Rodopi.
Thomason, Sarah Grey & Terrence Kaufman. 1988. Language contact, creolization and genetic
linguistics. Berkeley: University of California Press.
Torrent, Tiago Timponi, Michael Ellsworth, Collin F. Baker & Ely Edison da Silva Matos. 2018. The Multilingual FrameNet shared annotation task:
A preliminary report. In Proceedings of the International FrameNet workshop 2018:
Multilingual framenets and constructicons, 62–68. Miyazaki: ELRA.
van der Auwera, Johan. 2012. From contrastive linguistics to linguistic
typology. Languages in Contrast 12(1): 69–86.
von Fintel, Kai & Lisa Matthewson. 2008. Universals in semantics. The Linguistic Review 25(1–2): 139–201.
Wichmann, Søren & Eric W. Holman. 2013. Languages with longer words have more lexical
change. In Lars Borin & Anju Saxena (eds.), Approaches to measuring linguistic differences, 249–281. Berlin: De Gruyter Mouton.
Wichmann, Søren, Eric W. Holman & Cecil H. Brown. 2010. Sound symbolism in basic
vocabulary. Entropy 12(4): 844–858.
Wichmann, Søren, André Müller & Viveka Velupillai. 2010. Homelands of the world’s language families: A
quantitative approach. Diachronica 27(2): 247–276.
Wilks, Yorick. 2009. Ontotherapy, or how to stop worrying about what
there is. In Nicolas Nicolov, Galia Angelova & Ruslan Mitkov (eds.), Recent advances in natural language processing
V, 1–20. Amsterdam: John Benjamins.
