In:Mathematical Modelling in Linguistics and Text Analysis: Theory and applications
Edited by Adam Pawłowski, Sheila Embleton, Jan Mačutek and Aris Xanthos
[Current Issues in Linguistic Theory 370] 2025
► pp. 161–172
Linguistic correlates of semantic knowledge ontologies
An example of UDC and DDC classification
Published online: 13 October 2025
https://doi.org/10.1075/cilt.370.14paw
https://doi.org/10.1075/cilt.370.14paw
Abstract
This chapter provides a comparative analysis of bibliographic corpora including titles extracted from
large national bibliographies (Czech, Finnish, German, Norwegian, and Polish). From the examined corpora, subsets were
obtained, corresponding to the basic categories of the DDC/UDC formal ontologies (Dewey Decimal Classification and Universal
Decimal Classification). The most relevant sets of keywords were then generated from these subsets and projected onto a common
semantic space using automatic translation. The study revealed the existence of common ‘European’ sets of terms, corresponding
to the large semantic domains defined in the DDC/UDC ontology.
Article outline
- 1.Introduction
- 2.Research material
- 3.Goals and hypotheses
- 4.Previous research
- 5.Methods applied
- 6.Results
- 7.Conclusions and discussion
Notes References
References (14)
Al-Sheikh Hussein, Basel. 2012. The
Sapir-Whorf hypothesis today. Theory and Practice in Language
Studies 2(3). 642–646. URL: [URL];
Berger, Peter L. & Thomas Luckmann. 1966. The
social construction of reality: A treatise in the sociology of knowledge. Garden City, NY: Anchor Books.
Joulin, Armand, Edouard Grave, Piotr Bojanowski & Tomas Mikolov. 2017. Bag
of tricks for efficient text classification. In Mirella Lapata, Phil Blunsom & Alexander Koler (eds.), Proceedings
of the 15th conference of the European Chapter of the Association for Computational
Linguistics, Vol. 2, 427–431. Valencia: ACL.
Kay, Paul & Willett Kempton. 1984. What
is the Sapir-Whorf hypothesis? American
Anthropologist 86(1). 65–79. URL: [URL].
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado & Dean Jeffrey. 2013. Distributed
representations of words and phrases and their
compositionality. In Proceedings of the 26th international
conference on neural information processing
systems, Vol. 2 (NIPS’13), 3111–3119. Red Hook (NY): Curran Associates Inc.
Pawłowski, Adam, Elżbieta Herden & Krzysztof Topolski. 2021. Quantitative
analysis of bibliographic corpora: Statistical features, semantic profiles, word
spectra. In Adam Pawłowski, Jan Mačutek, Sheila Embleton & George Mikros (eds.), Language
and text: Data, models, information and
applications, 50–62. Amsterdam: Benjamins.
Pawłowski, Adam & Tomasz Walkowiak. 2020. Automatic
recognition of gender and genre in a corpus of
microtexts. In Wojciech Zamojski, Jacek Mazurkiewicz, Jarosław Sugier, Tomasz Walkowiak & Janusz Kacprzyk (eds.), Theory
and applications of dependable computer systems. Proceedings of the fifteenth international conference on
dependability of computer systems
DepCoS-RELCOMEX, 472–481. Cham: Springer.
. 2021. Analysis
of toponyms from the Polish National Bibliography. In Yasunobu Sumikawa, Ryohei Ikejiri, Antoine Doucet, Eva Pfanzelter, Mohammad Hasanuzzaman, Ian Milligan & Adam Jatowt (eds.), Proceedings
of the 6th international workshop on computational history 2021) co-located with ACM/IEEE joint conference on digital
libraries 2021. URL: [URL]
. 2023. Great
bibliographies as a source of data for the humanities — NLP in the analysis of gender of book authors in German
countries and in Poland (1801–2021). In Stefania Degaetano-Ortlieb, Anna Kazantseva, Nils Reiter & Stan Szpakowicz (eds.), Proceedings
of the 7th joint SIGHUM workshop on computational linguistics for cultural heritage, social sciences, humanities and
literature, 63–71. Dubrovnik: ACL.
Pustejovsky, James. 2006. Lexical
semantics: Overview. In Keith Brown (ed.), Encyclopedia
of language &
linguistics, 98–105. Amsterdam: Elsevier.
Straka, Milan, Jan Hajič & Jana Straková. 2016. UDPipe:
Trainable pipeline for processing CoNLL-U files performing tokenization, morphological analysis, POS tagging and
parsing. In Proceedings of the Tenth International Conference
on Language Resources and Evaluation
(LREC’16), 4290–4297. European Language Resources Association, Paris, France.
