In:Language and Text: Data, models, information and applications
Edited by Adam Pawłowski, Jan Mačutek, Sheila Embleton and George Mikros
[Current Issues in Linguistic Theory 356] 2021
► pp. 21–36
Term distance, frequency and collocations
Published online: 22 December 2021
https://doi.org/10.1075/cilt.356.02joh
https://doi.org/10.1075/cilt.356.02joh
Abstract
In this paper I study two co-occurrence measures, local to a particular corpus, for constructing collocations or relevance relations between words or terms. One is a distance measure, while the other uses different co-occurrence windows, one contained in the other. Both are discussed with respect to the common method of comparing co-occurrence measures within a particular corpus to those of a reference corpus. A practical consequence of these measures is that they may relieve the burden of computing a reference statistic, which may incur a high computational cost. We also believe that distance, as a measure in itself, has a theoretical interest. Being different from frequency, it may add something new to collocation analysis.
Keywords: collocation, term distance, frequency, Bayes, probability, concordance
Article outline
- 1.Introduction
- 2.Δ-score and Pointwise Mutual Information
- 3.Data and technical method
- 4.Collocations
- 4.1Frequency and context enlargement
- 4.2Distance
- 4.2.1The verb
- 4.2.2The noun
- 5.Discussion
Notes References
References (14)
Barnbrook, Geoff, Oliver Mason & Ramesh Krishnamurthy. 2013. Collocation applications and implications. Berlin: Springer.
Birkenes, Magnus Breder, Lars G. Johnsen, Arne M. Lindstad & Johanne Ostad. 2015. From digital library to n-grams: NB N-gram. In Beáta Megyesi (ed.), Proceedings of the 20th Nordic Conference of Computational Linguistics, NODALIDA 2015, 293–295. Linköping: Linköping University Electronic Press.
Blondel, Vincent D., Jean-Loup Guillaume, Renaud Lambiotte & Etienne Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 10. 1–13.
Church, Kenneth Ward & Patrick Hanks. 1989. Word association norms, mutual information, and lexicography. In Julia Hirschberg (ed.), Proceedings of the 27th Annual Meeting on Association for Computational Linguistics, 76–83. Stroudsburg, PA: Association for Computational Linguistics.
Firth, J. R. 1957. A synopsis of linguistic theory, 1930–1955. In Studies in linguistic analysis (special volume of the Transactions of the Philological Society), 1–32. Oxford: Basil Blackwell.
Halliday, Mark. 1992. Language as system and language as instance: The corpus as a theoretical construct. In Jan Svartvik (ed.), Directions in corpus linguistics: Proceedings of the Nobel Symposium 82 Stockholm, 4–8 August 1991, 61–78. Berlin: de Gruyter.
Jaynes, Edwin. T. 2003. Probability theory: The logic of science. Cambridge: Cambridge University Press.
Johnsen, Lars G. B. 2016. Graph analysis of word networks. In CEUR workshop proceedings, Vol-2021. urn:nbn:de:0074-2021-3.
2019. Modules, Github repository. [URL]
Cited by (1)
Cited by one other publication
This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.
