Topological mapping for visualisation of high-dimensional historical linguistic data

Moisl, Hermann

doi:10.1075/cilt.356.14moi

In:Language and Text: Data, models, information and applications
Edited by Adam Pawłowski, Jan Mačutek, Sheila Embleton and George Mikros
[Current Issues in Linguistic Theory 356] 2021
► pp. 209–224

Get fulltext from our e-platform

Download Book PDF

Download Book EPUB

Topological mapping for visualisation of high-dimensional historical linguistic data

Hermann Moisl | Newcastle University

Published online: 22 December 2021

https://doi.org/10.1075/cilt.356.14moi

Abstract

This paper addresses an issue in visualization of high-dimensional data abstracted from historical corpora whose importance in quantitative and corpus linguistics has thus far not been sufficiently appreciated: the possibility that the data is nonlinear. Most applications of data visualization in these fields use linear proximity measures which ignore nonlinearity, and, if the data is significantly nonlinear, can give misleading results. Topological mapping is a nonlinear visualization method, and its application via a particular topological mapping method, the Self-Organizing Map, is here exemplified with reference to a small historical text corpus.

Keywords: Historical linguistics, nonlinearity, high-dimensional data, topological mapping, clustering

Article outline

1.Introduction
2.Nonlinearity
- 2.1Nonlinearity in natural processes
- 2.2Nonlinearity in data
- 2.3Nonlinearity in linguistic data
3.The problem
4.Topological mapping
- 4.1Topology
- 4.2Projection of topological structure into low-dimensional space
- 4.3Preservation of nonlinearity
- 4.4Example
  - 4.4.1The text collection
  - 4.4.2Spelling data
  - 4.4.3The Self-Organizing Map
  - 4.4.4Result
5.Conclusion
References

References (24)

References

Allinson, Nigel, Hujun Yin, Lesley Allinson & Jon Slack (eds.). 2001. Advances in self-organising maps. Berlin: Springer.

Bertuglia, Cristoforo & Franco Vaio. 2005. Nonlinearity, chaos, and complexity: The dynamics of natural and social systems. Oxford: Oxford University Press.

Deza, Michel & Elena Deza. 2009. Encyclopedia of distances. Berlin: Springer.

Haykin, Simon. 1999. Neural networks. A comprehensive foundation. Upper Saddle River, NJ: Prentice Hall International.

Hubel, David & Torsten Wiesel. 2005. Brain and visual perception: The story of a 25-year collaboration. Oxford: Oxford University Press.

Izenman, Alan. 2008. Modern multivariate statistical techniques. Regression, classification, and manifold learning. Berlin: Springer.

Kaski, Samuel. 1997. Data exploration using Self-Organizing Maps. Helsinki: Helsinki University of Technology PhD thesis.

Kohonen, Teuvo. 2001. Self-Organizing Maps (3rd edn.). Berlin: Springer.

Lay, David. 2010. Linear algebra and its applications (4th edn.). London: Pearson Education International.

Lee, John. 2010. Introduction to topological manifolds (2nd edn.). Berlin: Springer.

Lee, John & Michel Verleysen. 2007. Nonlinear dimensionality reduction. Berlin: Springer.

Moisl, Hermann. 2015. Cluster analysis for corpus linguistics. Berlin: de Gruyter.

Munkres, James. 2000. Topology (2nd edn.). London: Pearson Education International.

Oja, Erkki & Samuel Kaski. 1999. Kohonen maps. Amsterdam: Elsevier.

Reid, Miles & Balasz Szendroi. 2005. Geometry and toplogy. Cambridge: Cambridge University Press.

Ritter, Helge, Thomas Martinetz & Klaus Schulten. 1992. Neural computation and Self-Organizing Maps. Boston: Addison-Wesley.

Strogatz, Steven. 2000. Nonlinear dynamics and chaos: With applications to physics, biology, chemistry and engineering. New York: Perseus Books.

Sutherland, Wilson. 2009. Introduction to metric and topological spaces (2nd edn.). Oxford: Oxford University Press.

Ultsch, Alfred. 2003. U∗-Matrix: a tool to visualize cluster in high-dimensional data. Technical report 36. Marburg: Department of Computer Science, University of Marburg.

Ultsch, Alfred & Peter Siemon. 1990. Kohonen’s self-organizing feature maps for exploratory data analysis. Proceedings of the International Neural Network Conference, INNC ’90, 305–308. Paris: Springer.

Van Hulle, Marc. 2000. Faithful representations and topographic maps. Hoboken, NJ: John Wiley and Sons.

Verleysen, Michel. 2003. Learning high-dimensional data. In Sergey Ablameyko, Marco Gori, Liviu Goras & Vincenzo Piuri (eds.) Limitations and future trends in neural computation, 141–162. Amsterdam: IOS Press.

Vesanto, Juha & Esa Alhoniemi. 2000. Clustering of the Self-Organizing Map. IEEE Transactions on Neural Networks 11. 586–600.

Xu, Rui & Don Wunsch. 2008. Clustering. Hoboken NJ: Wiley.