Predicting semantic transparency of Chinese QIEs using distributional semantics and lexical frequency

Ruo, Lin; Skalicky, Stephen

doi:10.1075/ijchl.00045.ruo

Article In: International Journal of Chinese Linguistics
Vol. 13:1 (2026) ► pp.69–99

Predicting semantic transparency of Chinese QIEs using distributional semantics and lexical frequency

Lin Ruo | Guangzhou Institute of Science and Technology

Stephen Skalicky | Victoria University of Wellington

This content is being prepared for publication; it may be subject to changes.

Abstract

Semantic transparency refers to the relationship between the meanings of whole-words and their constituent morphemes. Mandarin Chinese Quadrisyllabic Idiomatic Expressions (QIEs, also known as [Cheng-Yu]) have a similar property, in which the whole meaning is often more than the summed meanings of the component words. Using data from , we analyze the semantic transparency of QIEs as a function computational semantic similarity, word frequency, and syntactic structure. Our results indicate that the probability of a QIE being labelled as transparent increases with its frequency, whereas computational measures of semantic similarity and structure were not strongly associated with one existing set of semantic transparency labels. We hypothesize that these results may be influenced by the nature of QIEs, where their meaning may sometimes be obscured and based on traditional stories rather than any properties of the constituents — semantic or otherwise. Therefore, we advocate for the inclusion of word frequency as a factor in transparency rating of QIEs. Additionally, we suggest exploring other variables that may improve the transparency rating models.

Keywords: semantic transparency, Quadrisyllabic Idiomatic Expressions (QIEs), word, frequency, semantic similarity, syntactic structure

Article outline

1.Introduction
2.Literature review
- 2.1Semantic transparency
  - 2.1.1Human ratings of semantic transparency
  - 2.1.2Computational measures of semantic transparency
- 2.2Chinese QIEs and its semantic transparency
  - 2.2.1The feature of Chinese QIEs
    - Two levels of meaning
    - Varied syntactic and morphological patterns
    - Syntactic patterns
    - Morphological patterns
  - 2.2.2Relationship between structure and transparency
  - 2.2.3Relationship between frequency and transparency
  - 2.2.4Wu (2016) Study
- 2.3Current study
3.Method
- 3.1Data
- 3.2Segmenting QIEs into constituents
- 3.3Measuring QIE Frequency
- 3.4Measuring semantic similarity of the QIEs
  - External Similarity (ES)
  - Internal Similarity (IS)
- 3.5Statistical analysis
4.Results
- Descriptive statistics
- Model 1. Full range of transparency levels (continuous)
- Model 2. Logistic regression predicting transparent and opaque only
5.Discussion
- Word frequency and QIE transparency
- Semantic similarity and QIE transparency
- Selection of structure pattern
- Insights from model comparisons
6.Conclusion
Notes
Author queries
References

References (51)

References

Bell, M. J., & Schaefer, M. (2013). Semantic transparency : challenges for distributional semantics. IWCS 2013 Workshop Towards a Formal Distributional Semantics, 1–10.

Bell, M. J., & Schäfer, M. (2016). Modelling semantic transparency. Morphology, 26(2), 157–199.

Chen, Y., & Xing, H. (2010). Semantic extraction and semantic transparency automatic assessment experiments based on distributed characteristics. Modern Chinese (Language Research Version), 031, 111–113.

China National Office for Teaching Chinese as Foreign Language. (1992). the Syllabus of Graded Characters for Chinese Proficiency. Beijing Language and Culture University Press.

Frisson, S., Niswander-Klement, E., & Pollatsek, A. (2008). The role of semantic transparency in the processing of English compound words. British Journal of Psychology, 99(1).

Fu, P. (2012). Semantic Transparency Analysis of Chinese Idioms Based on TCFL [Master’s thesis]. Shandong University.

Gagné, C. L., & Spalding, T. L. (2016). Effects of morphology and semantic transparency on typing latencies in English compound and pseudocompound words. Journal of Experimental Psychology: Learning Memory and Cognition, 42(9).

Gagné, C. L., Spalding, T. L., & Schmidtke, D. (2019). LADEC: The Large Database of English Compounds. Behavior Research Methods, 51(5), 2152–2179.

Kim, S. Y., Yap, M. J., & Goh, W. D. (2019). The role of semantic transparency in visual word recognition of compound words: A megastudy approach. Behavior Research Methods, 51(6).

Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2–3).

Li, D., Zhang, Y., & Wang, X. (2016). Descriptive norms for 350 Chinese idioms with seven syntactic structures. Behavior Research Methods, 48(4).

Li, J. (2011). A quantification analysis of the transparency of the lexical meaning of the modern Chinese dictionary. Chinese Linguistics, 35(3), 54–62.

Li, J., & Li, Y. (2008). On the transparency of lexical meaning. Studies in Language and Linguistics, 28(3), 60–65.

Li, L., & Hin Tat, C. (2014). Acquisition of Chinese quadra-syllabic idiomatic expressions: Effects of semantic opacity and structural symmetry. First Language, 34(4), 336–353.

Li, M., Jiang, N., & Gor, K. (2017). L1 and L2 processing of compound words: Evidence from masked priming experiments in English. Bilingualism, 20(2).

Libben, G., Gibson, M., Yoon, Y. B., & Sandra, D. (2003). Compound fracture: The role of semantic transparency and morphological headedness. Brain and Language, 84(1), 50–64.

Liu, T. H., & Su, L. I. W. (2021). Chinese idioms as constructions: Frequency, semantic transparency and their processing. Language and Linguistics, 22(4).

Liu, Z. Q., & Xing, M. P. (2000). Semantic symmetry and cognition on Chinese four character idioms. Chinese Teaching in the World, 511, 77–81.

Marelli, M., & Luzzatti, C. (2012). Frequency effects in the processing of Italian nominal compounds: Modulation of headedness and semantic transparency. Journal of Memory and Language, 66(4).

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. 1st International Conference on Learning Representations, ICLR 2013 — Workshop Track Proceedings.

Mok, L. W. (2009). Word-superiority effect as a function of semantic transparency of Chinese bimorphemic compound words. Language and Cognitive Processes, 24(7–8), 1039–1081.

Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. EMNLP 2014 — 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference.

Pollatsek, A., & Hyönä, J. (2005). The role of semantic transparency in the processing of Finnish compound words. Language and Cognitive Processes, 20(1–2), 261–290.

Rayner, K., Li, X., & Pollatsek, A. (2007). Extending the E-Z reader model of eye movement control to Chinese readers. Cognitive Science, 31(6).

Reddy, S., McCarthy, D., & Manandhar, S. (2011). An Empirical Study on Compositionality in Compound Nouns. IJCNLP 2011 — Proceedings of the 5th International Joint Conference on Natural Language Processing, 210–218.

Rehurek, R., & Sojka, P. (2011). Gensim — Statistical Semantics in Python. In Lecture Notes in Computer Science (Vol. 66111).

Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword Expressions: A Pain in the Neck for NLP. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 22761, pp. 1–15).

Schmidtke, D., Van Dyke, J. A., & Kuperman, V. (2018). Individual variability in the semantic processing of English compound words. Journal of Experimental Psychology: Learning Memory and Cognition, 44(3).

Song, B., Zhou, X., & Jin, T. (2017). A Study on the Coverage and Semantic Transparency of Words Exceeding the Standard of Vocabulary. Chinese Language Learning, 31, 95–104.

Song, Y., Shi, S., Li, J., & Zhang, H. (2018). Directional skip-gram: Explicitly distinguishing left and right context forword embeddings. NAACL HLT 2018 — 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies — Proceedings of the Conference, 21.

Tabossi, P., Fanari, R., & Wolf, K. (2008). Processing Idiomatic Expressions: Effects of Semantic Compositionality. Journal of Experimental Psychology: Learning Memory and Cognition, 34(2), 313–327.

Tang, X., & Liang, S. (2020). Study on Semantic Transparency of Chinese Compounds Based on Word Embedding. 2020 International Conference on Asian Language Processing, IALP 2020, 20141, 130–134.

the Institute of Linguistics of the Chinese Academy of Social Sciences. (2012). Contemporary Chinese Dictionary(Sixth Edition). The Commercial Press.

Tse, C. S., Yap, M. J., Chan, Y. L., Sze, W. P., Shaoul, C., & Lin, D. (2017). The Chinese Lexicon Project: A megastudy of lexical decision performance for 25,000+ traditional Chinese two-character compound words. Behavior Research Methods, 49(4), 1503–1519.

Tsou, B. K. (2012). Idiomaticity and classical traditions in some East Asian languages. Proceedings of the 26th Pacific Asia Conference on Language, Information, and Computation, 39–55.

Ullmann, S. (1962). Semantics. An Introduction to the Science of Meaning. In Oxford: Basil Blackwell.

Wang, H. C., Hsu, L. C., Tien, Y. M., & Pomplun, M. (2012). Estimating Semantic Transparency of Constituents of English Compounds and Two-Character Chinese Words using Latent Semantic Analysis. Building Bridges Across Cognitive Sciences Around the World — Proceedings of the 34th Annual Meeting of the Cognitive Science Society, CogSci 2012, 2499–2504.

(2014). Predicting raters’ transparency judgments of English and Chinese morphological constituents using latent semantic analysis. Behavior Research Methods, 46(1), 284–306.

Wang, S., Huang, C. R., Yao, Y., & Chan, A. (2019). The effect of morphological structure on semantic transparency ratings. Language and Linguistics, 20(2), 225–255.

Wang, X. (2010). The Chinese Idiom Dictionary. Sinolingua Press.

Wu, D. (2016). Research on the Semantic Transparency of Chinese Idioms and their Production in Essays by Primary And Secondary School Students [Master’s thesis]. Beijing Language and Culture University.

Xing, H., Li, S., Li, M., Wu, P., Shi, G., & Shu, H. (2016). Research on Components and Development of Students’ Native Language Competence. Chinese Journal of Language Policy and Planning, 1(5), 28–36.

Xu, J. (2015). Corpus-based Chinese studies: A historical review from the 1920s to the present. Chinese Language and Discourse, 6(2), 218–244.

Xu, S. (2006). Proximity and complementation: studies of the formation mechanism of idioms from a cognitive point of view. Journal of Sichuan International Studies University, 2–107.

Xu, Y. (2014). Semantic Transparency Research on Frequently Used Chinese Compound Words For Second Language Teaching. Beijing Normal University.

Zhan, W., Guo, R., & Chen, Y. (2003). The CCL Corpus of Chinese Texts: 700 million Chinese Characters, the 11th Century BC-present, Available online at the website of Center for Chinese Linguistics (abbreviated as CCL) of Peking University.

Zhan, W., Guo, R., Chang, B., Chen, Y., & Chen, L. (2019). The building of the CCL corpus: its design and implementation. Corpus Linguistics, 6(1), 71–86.

Zhao, D. (2016). Semantic Transparency Study of Commonly Used Idioms. Modern Communication, 21(4), 238–241.

Zhou, J. (2004). Theory on Word-structure of Chinese (in Chinese: Hanyu cihui jiegoulun). Shanghai Dictionary Press.

Zhou, X., & Marslen-Wilson, W. (1995). Morphological Structure in the Chinese Mental Lexicon. Language and Cognitive Processes, 10(6).

Zwitserlood, P. (1994). The Role of Semantic Transparency in the Processing and Representation of Dutch Compounds. Language and Cognitive Processes, 9(3).