In:Mathematical Modelling in Linguistics and Text Analysis: Theory and applications
Edited by Adam Pawłowski, Sheila Embleton, Jan Mačutek and Aris Xanthos
[Current Issues in Linguistic Theory 370] 2025
► pp. 81–89
Analyzing Japanese texts with evaluation of randomness in binary expression
Yosuke Takubo | National Institute of Technology (KOSEN), Niihama College | High Energy Accelerator Research Organization (KEK)
Published online: 13 October 2025
https://doi.org/10.1075/cilt.370.07tak
https://doi.org/10.1075/cilt.370.07tak
Abstract
Quantitative evaluation of the randomness of binary sequences is essential in the development of a random
number generator (RNG), which is a crucial technology for encrypted communication. The same technique used in that field may
be applicable to statistical analyses of natural language texts. Borel normality is one such method, which is investigated for
Japanese texts as the first trial of our study. The Borel normality measure of Japanese texts is calculated by transforming
them into binary expressions with character encoding such as UTF-8, SJIS, and EUC. The distributions of the Borel normality
measure are compared among different registers with different character encodings. The features of Borel normality in Japanese
texts are discussed in this chapter.
Keywords: Borel normality, Japanese, binary encoding, BCCWJ
Article outline
- 1.Introduction
- 2.Borel normality
- 3.Analysis method
- 4.Analysis results
- 5.Summary and conclusions
Acknowledgements References
References (6)
Abbott, Alastair A., Cristian S. Calude, Michael J. Dinneen & Nan Huang. 2019. Experimentally
probing the algorithmic randomness and incomputability of quantum randomness. Physica
Scripta 94(4), 045103.
Champernowne, David G. 1933. The construction of
decimals normal in the scale of ten. Journal of the London Mathematical
Society s1–8(4). 254–260.
Lempel, Abraham & Jacob Ziv. 1976. On
the complexity of finite sequences. IEEE Transactions on Information
Theory 22(1). 75.
