Unsupervised Learning of Linguistic Structure: An Empirical Evaluation

Powers, David

doi:10.1075/ijcl.2.1.06pow

Article published In: International Journal of Corpus Linguistics
Vol. 2:1 (1997) ► pp.91–131

Get fulltext from our e-platform

Download PDF

Unsupervised Learning of Linguistic Structure

An Empirical Evaluation

David Powers | Flinders University of SA

Published online: 1 January 1997

https://doi.org/10.1075/ijcl.2.1.06pow

Computational Linguistics and Natural Language have long been targets for Machine Learning, and a variety of learning paradigms and techniques have been employed with varying degrees of success. In this paper, we review approaches which have adopted an unsupervised learning paradigm, explore the assumptions which underlie the techniques used, and develop an approach to empirical evaluation. We concentrate on a statistical framework based on N-grams, although we seek to maintain neurolinguistic plausibility. The model we adopt places putative linguistic units in focus and associates them with a characteristic vector of statistics derived from occurrence frequency. These vectors are treated as defining a hyperspace, within which we demonstrate a technique for examining the empirical utility of the various metrics and normalization, visualization, and clustering techniques proposed in the literature. We conclude with an evaluation of the relative utility of a large array of different metrics and processing techniques in relation to our defined performance criteria.

Keywords: Unsupervised Learning, Singular Valued Decomposition, Spearman Rank Correlation, Multidimensional Scaling, Feature Maps, Self-Organization, Phonology, Classification, Orthography, Tagging, Syntax

Cited by (5)

Cited by five other publications

Order by:

Mayer, Connor

2020. An algorithm for learning phonological classes from distributional similarity. Phonology 37:1 ► pp. 91 ff.

Powers, David M W

2013. 2013 IEEE Symposium on Computational Intelligence for Human-like Intelligence (CIHLI), ► pp. 100 ff.

Powers, David M. W. & Richard Leibbrandt

2009. Rough Diamonds in Natural Language Learning. In Rough Sets and Knowledge Technology [Lecture Notes in Computer Science, 5589], ► pp. 17 ff.

Honkela, Timo, Ville Könönen, Tiina Lindh‐Knuutila & Mari‐Sanna Paukkeri

2008. Simulating processes of concept formation and communication. Journal of Economic Methodology 15:3 ► pp. 245 ff.

Powers, David

2002. Robot Babies. In Ecology of Language Acquisition [Educational Linguistics, 1], ► pp. 159 ff.

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.