Cover not available

Article published In: The Unit of Processing in Chinese
Edited by Tianlin Wang
[International Journal of Chinese Linguistics 11:1] 2024
► pp. 94109

References (38)
References
Arnon, I., & Priva, U. C. (2013). More than words: The effect of multi-word frequency and constituency on phonetic duration. Lang. Speech, 56(Pt 3), 349–371. Google Scholar logo with link to Google Scholar
Baker, A. (2022). Simplicity. In E. N. Zalta (Ed.), The Stanford encyclopedia of philosophy (Summer 2022). [URL]; Metaphysics Research Lab, Stanford University.
Beinborn, L., & Pinter, Y. (2023). Analyzing cognitive plausibility of subword tokenization. In H. Bouamor, J. Pino, & K. Bali (Eds.), Proceedings of the 2023 conference on empirical methods in natural language processing (pp. 4478–4486). Association for Computational Linguistics. Google Scholar logo with link to Google Scholar
Brugnara, F., Falavigna, D., & Omologo, M. (1993). Automatic segmentation and labeling of speech based on hidden markov models. Speech Commun., 12(4), 357–370. Google Scholar logo with link to Google Scholar
Chater, N. (1999). The search for simplicity: A fundamental cognitive principle? Q. J. Exp. Psychol. A, 52A(2), 273–302. Google Scholar logo with link to Google Scholar
Chater, N., & Vitányi, P. (2003). Simplicity: A unifying principle in cognitive science? Trends Cogn. Sci., 7(1), 19–22. Google Scholar logo with link to Google Scholar
Delétang, G., Ruoss, A., Duquenne, P.-A., Catt, E., Genewein, T., Mattern, C., Grau-Moya, J., Wenliang, L. K., Aitchison, M., Orseau, L., Hutter, M., & Veness, J. (2023). Language modeling is compression. [URL]
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. Google Scholar logo with link to Google Scholar
Feldman, J. (2016). The simplicity principle in perception and cognition. Wiley Interdiscip. Rev. Cogn. Sci., 7(5), 330–340. Google Scholar logo with link to Google Scholar
Gage, P. (1994). A new algorithm for data compression. The C Users Journal Archive. [URL]
Goldwater, S., Griffiths, T. L., & Johnson, M. (2009). A bayesian framework for word segmentation: Exploring the effects of context. Cognition, 112(1), 21–54. Google Scholar logo with link to Google Scholar
Gruver, N., Finzi, M., Qiu, S., & Wilson, A. G. (2023). Large language models are Zero-Shot time series forecasters. [URL]
Isbilen, E. S., & Christiansen, M. H. (2020). Chunk-Based memory constraints on the cultural evolution of language. Top. Cogn. Sci., 12(2), 713–726. Google Scholar logo with link to Google Scholar
Isbilen, E. S., McCauley, S. M., Kidd, E., & Christiansen, M. H. (2020). Statistically induced chunking recall: A Memory-Based approach to statistical learning. Cogn. Sci., 44(7), e12848. Google Scholar logo with link to Google Scholar
Kudo, T. (2018). Subword Regularization: Improving Neural Network Translation Models with Multiple Subword Candidates. In I. Gurevych & Y. Miyao (Eds.), Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 66–75). Association for Computational Linguistics. Google Scholar logo with link to Google Scholar
Kudo, T., & Richardson, J. (2018). SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 66–71. Google Scholar logo with link to Google Scholar
Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., & Soricut, R. (2020, February 8). ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. Google Scholar logo with link to Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proc. IEEE, 86(11), 2278–2324. Google Scholar logo with link to Google Scholar
Lieber, O., Sharir, O., Lenz, B., & Shoham, Y. (2021). Jurassic-1: Technical details and evaluation. White Paper. AI21 Labs, 11.Google Scholar logo with link to Google Scholar
Meltzoff, A. N., Kuhl, P. K., Movellan, J., & Sejnowski, T. J. (2009). Foundations for a new science of learning. Science, 325(5938), 284–288. Google Scholar logo with link to Google Scholar
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. [URL]
Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C. L., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022, March 4). Training language models to follow instructions with human feedback. Google Scholar logo with link to Google Scholar
Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–1543. Google Scholar logo with link to Google Scholar
Perruchet, P., & Vinter, A. (1998). PARSER: A model for word segmentation. J. Mem. Lang., 39(2), 246–263. Google Scholar logo with link to Google Scholar
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2019). Exploring the limits of transfer learning with a unified Text-to-Text transformer. [URL]
Rissanen, J. (1978). Modeling by shortest data description. Automatica, 14(5), 465–471. Google Scholar logo with link to Google Scholar
Ruoss, A., Delétang, G., Genewein, T., Grau-Moya, J., Csordás, R., Bennani, M., Legg, S., & Veness, J. (2023). Randomized positional encodings boost length generalization of transformers. [URL].
Schapiro, A. C., Turk-Browne, N. B., Norman, K. A., & Botvinick, M. M. (2016). Statistical learning of temporal community structure in the hippocampus. Hippocampus, 26(1), 3–8. Google Scholar logo with link to Google Scholar
Schuster, M., & Nakajima, K. (2012). Japanese and Korean voice search. 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5149–5152. Google Scholar logo with link to Google Scholar
Sennrich, R., Haddow, B., & Birch, A. (2016). Neural machine translation of rare words with subword units. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1715–1725. Google Scholar logo with link to Google Scholar
Sun, Y., Wang, S., Li, Y., Feng, S., Chen, X., Zhang, H., Tian, X., Zhu, D., Tian, H., & Wu, H. (2019, April 19). ERNIE: Enhanced Representation through Knowledge Integration. Google Scholar logo with link to Google Scholar
Tian, Y., James, I., & Son, H. (2023). How Are Idioms Processed Inside Transformer Language Models? In A. Palmer & J. Camacho-collados (Eds.), Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023) (pp. 174–179). Association for Computational Linguistics. Google Scholar logo with link to Google Scholar
Yang, J. (2022). Discovering the units in language cognition: From empirical evidence to a computational model [PhD thesis, Radboud University & Max Planck Institute for Psycholinguistics].
Yang, J., Cai, Q., & Tian, X. (2020). How do we segment text? Two-stage chunking operation in reading. eNeuro, 7(3). Google Scholar logo with link to Google Scholar
Yang, J., Frank, S. L., & van den Bosch, A. (2020). Less is Better: A cognitively inspired unsupervised model for language segmentation. Proceedings of the Workshop on the Cognitive Aspects of the Lexicon, 33–45. [URL]
Yang, J., van den Bosch, A., & Frank, S. L. (2022). Unsupervised text segmentation predicts eye fixations during reading. Frontiers in Artificial Intelligence, 51. Google Scholar logo with link to Google Scholar
Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V. (2020, January 2). XLNet: Generalized Autoregressive Pretraining for Language Understanding. Google Scholar logo with link to Google Scholar
Zipf, G. K. (1949). Human behavior and the principle of least effort (Vol. 5731). Addison-Wesley Press. [URL]
Cited by (7)

Cited by seven other publications

Faturohman, Muhammad Iqbal & Miftahol Arifin
2025. TEXTBLOB-BASED SENTIMENT ANALYSIS OF TABUNGAN PERUMAHAN RAKYAT (TAPERA) POLICY: A PUBLIC PERCEPTION STUDY. J at ti Undip: Jurnal Teknik Industri 20:1  pp. 11 ff. DOI logo
Jing, Heng, Qinbo Sun, Zhaohui Dang & Hua Wang
2025. Intention Recognition of Space Noncooperative Targets Using Large Language Models. Space: Science & Technology 5 DOI logo
Yoosefzadeh-Najafabadi, Mohsen
2025. From text to traits: exploring the role of large language models in plant breeding. Frontiers in Plant Science 16 DOI logo
Fernando, Chrisantha, Simon Osindero & Dylan Banarse
2024. The origin and function of external representations. Adaptive Behavior 32:6  pp. 515 ff. DOI logo
Wang, Tianlin
2024. Introduction. International Journal of Chinese Linguistics 11:1  pp. 1 ff. DOI logo
Yang, Jin, Zhiqiang Wang, Yanbin Lin & Zunduo Zhao
2024. 2024 IEEE International Conference on Big Data (BigData),  pp. 6387 ff. DOI logo
Zheng, Zaiyi, Yushun Dong, Song Wang, Haochen Liu, Qi Wang & Jundong Li
2024. 2024 IEEE International Conference on Big Data (BigData),  pp. 805 ff. DOI logo

This list is based on CrossRef data as of 12 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.

Mobile Menu Logo with link to supplementary files background Layer 1 prag Twitter_Logo_Blue