상세검색
최근 검색어 전체 삭제
다국어입력
즐겨찾기0
학술저널

한글 문서의 통계적 특징

Statistical Charateristics for Korean Texts

  • 21
134362.jpg

In this paper, the occurrence frequencies for three kinds of phonemes, syllables and words in ordinary Korean text are investigated from a corpus of over 800,000 words. The six most frequent chosungs, jungsungs and jongsungs account for 70%, 80% and 90% of the corresponding phoneme occurrences, respectively. The number of syllables occurred in text is 1.705, which is about 15% of all possible syllables in mordem Korean. The 315 most frequent syllables account for 90% of all syllable occurrences. The 10 and 20 most frequent wods account for 4.4% and 6.6% of all word occurrences. While the average word length for distinct words is 3.88, the average length of word occurrences is 2.88.

Abstract

1. 서론

2. 음소들(초성, 중성, 종성)의 출현 빈도

3. 음절의 빈도

4. 단어의 출현 빈도

5. 음소, 음절 그리고 단어의 Entropy

6. 결론

References

(0)

(0)

로딩중