In:Language and Text: Data, models, information and applications
Edited by Adam Pawłowski, Jan Mačutek, Sheila Embleton and George Mikros
[Current Issues in Linguistic Theory 356] 2021
► pp. 209–224
Topological mapping for visualisation of high-dimensional historical linguistic data
Published online: 22 December 2021
https://doi.org/10.1075/cilt.356.14moi
https://doi.org/10.1075/cilt.356.14moi
Abstract
This paper addresses an issue in visualization of high-dimensional data abstracted from historical corpora whose importance in quantitative and corpus linguistics has thus far not been sufficiently appreciated: the possibility that the data is nonlinear. Most applications of data visualization in these fields use linear proximity measures which ignore nonlinearity, and, if the data is significantly nonlinear, can give misleading results. Topological mapping is a nonlinear visualization method, and its application via a particular topological mapping method, the Self-Organizing Map, is here exemplified with reference to a small historical text corpus.
Article outline
- 1.Introduction
- 2.Nonlinearity
- 2.1Nonlinearity in natural processes
- 2.2Nonlinearity in data
- 2.3Nonlinearity in linguistic data
- 3.The problem
- 4.Topological mapping
- 4.1Topology
- 4.2Projection of topological structure into low-dimensional space
- 4.3Preservation of nonlinearity
- 4.4Example
- 4.4.1The text collection
- 4.4.2Spelling data
- 4.4.3The Self-Organizing Map
- 4.4.4Result
- 5.Conclusion
References
References (24)
Allinson, Nigel, Hujun Yin, Lesley Allinson & Jon Slack (eds.). 2001. Advances in self-organising maps. Berlin: Springer.
Bertuglia, Cristoforo & Franco Vaio. 2005. Nonlinearity, chaos, and complexity: The dynamics of natural and social systems. Oxford: Oxford University Press.
Haykin, Simon. 1999. Neural networks. A comprehensive foundation. Upper Saddle River, NJ: Prentice Hall International.
Hubel, David & Torsten Wiesel. 2005. Brain and visual perception: The story of a 25-year collaboration. Oxford: Oxford University Press.
Izenman, Alan. 2008. Modern multivariate statistical techniques. Regression, classification, and manifold learning. Berlin: Springer.
Kaski, Samuel. 1997. Data exploration using Self-Organizing Maps. Helsinki: Helsinki University of Technology PhD thesis.
Lay, David. 2010. Linear algebra and its applications (4th edn.). London: Pearson Education International.
Ritter, Helge, Thomas Martinetz & Klaus Schulten. 1992. Neural computation and Self-Organizing Maps. Boston: Addison-Wesley.
Strogatz, Steven. 2000. Nonlinear dynamics and chaos: With applications to physics, biology, chemistry and engineering. New York: Perseus Books.
Sutherland, Wilson. 2009. Introduction to metric and topological spaces (2nd edn.). Oxford: Oxford University Press.
Ultsch, Alfred. 2003. U∗-Matrix: a tool to visualize cluster in high-dimensional data. Technical report 36. Marburg: Department of Computer Science, University of Marburg.
Ultsch, Alfred & Peter Siemon. 1990. Kohonen’s self-organizing feature maps for exploratory data analysis. Proceedings of the International Neural Network Conference, INNC ’90, 305–308. Paris: Springer.
Van Hulle, Marc. 2000. Faithful representations and topographic maps. Hoboken, NJ: John Wiley and Sons.
Verleysen, Michel. 2003. Learning high-dimensional data. In Sergey Ablameyko, Marco Gori, Liviu Goras & Vincenzo Piuri (eds.) Limitations and future trends in neural computation, 141–162. Amsterdam: IOS Press.
