In:Corpus Methods for Semantics: Quantitative studies in polysemy and synonymy
Edited by Dylan Glynn and Justyna A. Robinson
[Human Cognitive Processing 43] 2014
► pp. 405–441
Cluster analysis
Finding structure in linguistic data
Published online: 6 November 2014
https://doi.org/10.1075/hcp.43.16div
https://doi.org/10.1075/hcp.43.16div
Cluster analysis is an exploratory data analysis technique, encompassing a number of different algorithms and methods for sorting objects into groups. Cluster analysis requires the analyst to make choices about dissimilarity measures, grouping algorithms, etc., and these choices are difficult to make without an understanding of their theoretical implications and a very good understanding of the data. This chapter provides an introduction to the distance measures and clustering algorithms most commonly used for cluster analytic work. Different from Baayen (2008), Johnson (2008) and Gries (2009), its main aim is to equip the researcher with at least a basic understanding of what is happening behind the scenes when a dataset is explored with the help of a particular cluster analytic technique.
Keywords: clustering algorithms, distance measures
References (19)
Alviar, J.J. (2008). Recent advances in computational linguistics and their application to biblical studies.
New Testament Studies
, 54(1),139–159
Baayen, R.H. (2008).
Analyzing linguistic data: A practical introduction to statistics using R
. Cambridge: Cambridge University Press.
Backhaus, K., Erichson, B., Plinke, W., & Weiber, R. (1996).
Multivariate Analysemethoden: Eine anwendungsorientierte Einführung
.
8th edition
. Berlin; Heidelberg; New York: Springer.
Brock, G., Pihur, V., Datta, S., & Datta, S. (2011). clValid: Validation of clustering results.
Journal of Statistical Software
, 25(4), March 2008. R package version 0.6-2. [URL].
Divjak, D., & Gries, St. Th. (2006). Ways of trying in Russian: Clustering behavioral profiles.
Journal of Corpus Linguistics and Linguistic Theory
, 2(1), 23–60.
Everitt, B.S., Landau, S., Leese, M., & Stahl, D. (2011).
Cluster analysis
.
5th edition
. Oxford: Wiley.
Gower, J., & Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients.
Journal of Classification
, 3(1), 5–48.
Gries, St. Th. (2009).
Statistics for linguistics with R: A practical introduction
. Berlin: Mouton de Gruyter.
Harnad, S. (2005). To cognize is to categorize: Cognition is categorization. In C. Lefebvre & H. Cohen (Eds.),
Handbook on categorization
(pp. 19–43). Oxford & London: Elsevier.
Hennig, C. (2010). fpc: Flexible procedures for clustering. R package version 2.0-3. [URL].
Kaufman, L., & Rousseeuw, P.J. (1990).
Finding groups in data: An introduction to cluster analysis (Series in Applied Probability and Statistics)
. New York: Wiley-Blackwell.
Milligan, G.W., & Cooper, M.C. (1985). An examination of procedures for determining the number of clusters in a data set.
Psychometrika
, 50, 159–179.
R Development Core Team (2008). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. [URL].
Rousseeuw, P.J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis.
Journal of Computational and Applied Mathematics
, 20(1), 53–65.
Shaw, D. (1974). Statistical analysis of dialectal boundaries.
Computers and the Humanities
, 8, 173–177.
Suzuki, R., & Shimodaira, H. An R package for hierarchical clustering with p-values. Retrieved from [URL] [Accessed 25 May 2012].
Cited by (28)
Cited by 28 other publications
Astobiza, Aníbal M.
Dai, Ying & Yicheng Wu
Zhang, Yixuan, Yimeng Wang, Nutchanon Yongsatianchot, Joseph D Gaggiano, Nurul M Suhaimi, Anne Okrah, Miso Kim, Jacqueline Griffin & Andrea G Parker
Liu, Meili
Milin, Petar, Benjamin V. Tucker & Dagmar Divjak
Robledo, Hernán & Rogelio Nazar
2023. A proposal for the inductive categorisation of parenthetical discourse markers in Spanish using parallel corpora. International Journal of Corpus Linguistics 28:4 ► pp. 500 ff.
SUGAWARA, Yuki & Kazuho KAMBARA
Van den Heede, Margot & Peter Lauwers
Wu, Shuqiong & Yue Ou
Zhou, Jiangping
王, 婷
Krawczak, Karolina
2022. Modeling constructional variation. In Analogy and Contrast in Language [Human Cognitive Processing, 73], ► pp. 341 ff.
Siahaan, Poppy
Wang, Jiaojiao & Jiangping Zhou
Torres, Peter Joseph
Johansson, Marjut & Veronika Laippala
2020. Affectivity in the #jesuisCharlie Twitter discussion. Pragmatics. Quarterly Publication of the International Pragmatics Association (IPrA) 30:2 ► pp. 179 ff.
Dattner, Elitzur
Proos, Mariann
2019. Polysemy of the Estonian perception verb nägema ‘to see’. In Perception Metaphors [Converging Evidence in Language and Communication Research, 19], ► pp. 231 ff.
Vandevoorde, Lore
Kifokeris, Dimosthenis & Yiannis Xenidis
Ioannou, Georgios
2017. A corpus-based analysis of the verbpleróoin Ancient Greek. Review of Cognitive Linguistics 15:1 ► pp. 253 ff.
Ioannou, Georgios
Vandevoorde, Lore, Els Lefever, Koen Plevoets & Gert De Sutter
2017. A corpus-based study of semantic differences in translation. Target. International Journal of Translation Studies 29:3 ► pp. 388 ff.
Desagulier, Guillaume
Desagulier, Guillaume
Desagulier, Guillaume
[no author supplied]
2016. Review of Moisl ((2015)): Cluster Analysis for Corpus Linguistics. International Journal of Corpus Linguistics 21:4 ► pp. 581 ff.
This list is based on CrossRef data as of 10 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
