In:Language and Text: Data, models, information and applications
Edited by Adam Pawłowski, Jan Mačutek, Sheila Embleton and George Mikros
[Current Issues in Linguistic Theory 356] 2021
► pp. 121–134
Distribution and characteristics of commonly used words across different texts in Japanese
Published online: 22 December 2021
https://doi.org/10.1075/cilt.356.08yam
https://doi.org/10.1075/cilt.356.08yam
Abstract
In this chapter, I survey the frequency distribution of commonly used words across different texts in Japanese.
Using the Balanced Corpus of Contemporary Written Japanese, we examined the distribution. The results show the following. (1) The
distribution draws a curve similar to Zipf’s law, but the curve always begins to increase shortly before the degree of commonality
reaches its maximum, (2) neither the length nor the number of the texts affects the distribution trend, (3) as the text length
increases, the number of commonly used words also increases linearly, but it reaches a maximum point due to the limited number of
basic words.
Article outline
- 1.The law of distribution of words
- 2.Previous studies
- 3.Data and method
- 4.Results
- 5.Interpretation
- 6.Conclusion and further challenges
References
References (4)
Maekawa, Kikuo, Makoto Yamazaki, Toshinobu Ogiso, Takehiko Maruyama, Hideki Ogura, Wakako Kashino, Hanae Koiso, Masaya Yamaguchi, Makiro Tanaka & Yasuharu Den. 2014. Balanced
Corpus of Contemporary Written Japanese. Language Resources and
Evaluation 48. 345–371.
National Institute for Japanese Language and
Linguistics. 1952. A Research Newspaper
Vocabulary. Tokyo: Shuei Publishing Co.
