Article published In: Lexical semantics towards the big-data era
Edited by Meichun Liu and Chunyu Kit
[Chinese Language and Discourse 14:1] 2023
► pp. 50–99
What more can empirical contextual data tell about the real usage of words and collocations?
A case study of the qià (恰) cluster with Chinese Gigaword data
Published online: 27 June 2022
https://doi.org/10.1075/cld.00041.kit
https://doi.org/10.1075/cld.00041.kit
Abstract
This article presents a corpus-based distributional analysis of the usage patterns of a cluster of words and compounds containing the morpheme qià (恰) ‘just, exactly’, by the aid of an extended concordancer to retrieve representative collocations from their adjacent contexts in Chinese Gigaword. Upon a survey of the historical evolution of the qià cluster with exemplar data and an overview of existing proposals to account for their usages in terms of expectational match, our distributional analysis is conducted to identify the salient collocational or contextual features that lead to a number of interesting findings. Substantial evidences are provided for clarifying the non-word status of qià rú (恰如) and qià sì (恰似) and their similarities, the exchangeability of qiàhǎo (恰好) and qiàqiǎo (恰巧), distinct collocational preferences of the adverbs qià (恰), qiàqià (恰恰) and the others with different subsets of verbs, the prosodic requirement of an even number of syllables for a qià-adverb and its main verb, and the contrastive popularity of qiàqià (恰恰) vs qiàdàng (恰当) to reveal different usage tendencies between speakers in Taiwan and the Mainland. All these novel findings and insights about the subtle (dis)similarities in the usage and meanings of the qià (恰) cluster suggest that distributional analysis of contextual collocations using large-scale language data remains a powerful tool that can complement other analytical approaches for the advancement of lexical semantic research.
Article outline
- 1.Introduction
- 2.The origins and diachronic development of the qia (恰) cluster
- 3.Methodology
- 4.Data extraction and analysis
- 4.1Transliterations and names as exceptions
- 4.2Contexts of the qià (恰) cluster
- 4.2.1Contexts of qià (恰)
- 4.2.2Contexts of qiàqià (恰恰)
- 4.2.3Contexts of qià rú (恰如) and qià sì (恰似)
- 4.2.4Contexts of qiàhǎo (恰好) and qiàqiǎo (恰巧)
- 4.2.5Contexts of qiàdàng (恰当)
- 4.3Popularity of the qià (恰) cluster
- 5.Conclusions
- Notes
References
References (16)
Firth, John Rupert. (1957). A synopsis of linguistic theory, 1930–1955. In John Rupert Firth, et al. (Ed.), Studies in Linguistic Analysis, pp. 1–32. Oxford: Philological Society. Reprinted in F. R. Palmer (Ed.), Selected Papers of J. R. Firth 1952–1959, pp. 168–205. Bloomington: Indiana University Press. 1968.
Hinton, G. E., J. L. McClelland and D. E. Rumelhart. (1986). Distributed representations. In David E. Rumelhart, James L. McClelland and the PDP Research Group (Eds.), Parallel Distributed Processing: Explorations in the microstructure of cognition, Volume 11: Foundations. Cambridge, MA: MIT Press.
Kit, Chunyu. (1998). Ba and bei as multi-valence prepositions in Chinese. In Benjamin K. T’sou (ed.), Studia Linguistica Sinica, pp. 497–522. Language Information Sciences Research Centre, City University of Hong Kong.
Kit, Chunyu and Yorick Wilks. (1998). The Virtual Corpus approach to deriving n-gram statistics from large scale corpora. In Changning Huang (Ed.), Proceedings of 1998 International Conference on Chinese Information Processing Conference, pp. 223–229.
Lu, Shuxiang. (1980). Xiandai Hanyu Babai Ci (Modern Chinese Eight Hundred Words). Beijing: The Commercial Press.
Lu, Ying and Meichun Liu. (2021). Grammatical development and semantic change of the qià-based lexical cluster: From objective match to subjective evaluation. In Liu, Meichun, Chunyu Kit & Qi Su (Eds.), Chinese Lexical Semantics: 21st Workshop, CLSW 2020, Hong Kong, China, May 28–30, 2020, Revised Selected Papers, pp. 235–252. LNAI 12278. Switzerland: Springer.
Luhn, Hans Peter. (1960). Keyword-in-context index for technical literature (KWIC index). American Documentation, 11(4):288–295.
Manning, Christopher D., and Hinrich Schütze. (1999). Foundations of Statistical Natural Language Processing. Cambridge, Mass: MIT Press.
Mikolov, Tomás, Kai Chen, Greg Corrado and Jeffrey Dean. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Saussure, Ferdinand de. (1916/1959). Course in General Linguistics. Charles Bally and Albert Sechehaye (Eds.), Wade Baskin (Trans.). New York: Philosophical Library.
Wittgenstein, Ludwig. (1953). Philosophical Investigations. G. E. M. Anscombe and R. Rhees (Eds.), G. E. M. Anscombe (Trans.). Oxford: Blackwell.
Xun, Endong, Gaoqi Rao, Xiaoyue Xiao and Jiaojiao Zang. (2016). Da shuju xia BBC yuliaoku de yanzhi [Building the BCC corpus in the background of big data], Yuliaoku Yuyanxue [Corpus Linguistics], 3(1):93–109.
