한글 문서의 통계적 특징
Statistical Charateristics for Korean Texts
- 호서대학교 중앙도서관
- 호서대학교 논문집
- 제3권
-
1995.12173 - 184 (12 pages)
- 21
In this paper, the occurrence frequencies for three kinds of phonemes, syllables and words in ordinary Korean text are investigated from a corpus of over 800,000 words. The six most frequent chosungs, jungsungs and jongsungs account for 70%, 80% and 90% of the corresponding phoneme occurrences, respectively. The number of syllables occurred in text is 1.705, which is about 15% of all possible syllables in mordem Korean. The 315 most frequent syllables account for 90% of all syllable occurrences. The 10 and 20 most frequent wods account for 4.4% and 6.6% of all word occurrences. While the average word length for distinct words is 3.88, the average length of word occurrences is 2.88.
Abstract
1. 서론
2. 음소들(초성, 중성, 종성)의 출현 빈도
3. 음절의 빈도
4. 단어의 출현 빈도
5. 음소, 음절 그리고 단어의 Entropy
6. 결론
References
(0)
(0)