상세검색
최근 검색어 전체 삭제
다국어입력
즐겨찾기0
국가지식-학술정보

한국어 통시 신문 말뭉치의 구축과 활용

Building a Diachronic Corpus of Korean Newspapers and its Application

  • 0
커버이미지 없음

This paper introduces the process of building a Korean diachronic corpus based on articles in Chosun Ilbo and Donga Ilbo from 1920 to 2019. Newspapers reflect not only the social but also the linguistic reality of their time, as they convey a variety of information and thoughts in the language of ordinary people. Such data must be processed into a form that can be analyzed quantitatively for an effective understanding of this linguistic reality. In order to do so, the spacing and notation of some vocabulary items were modified to meet current norms, and vocabulary listed in various dictionaries was added to the dictionary referenced by the morphological analyzer to improve vocabulary unit detection. After this pre-processing, changes in linguistic form were investigated to show the application of this corpus. The mean number of syllables in words decreased and the length of the sentences showed a continuous decrease. In addition, the proportion of Chinese characters in articles dropped and the use of Hangul and Alphabets has increased.

This paper introduces the process of building a Korean diachronic corpus based on articles in Chosun Ilbo and Donga Ilbo from 1920 to 2019. Newspapers reflect not only the social but also the linguistic reality of their time, as they convey a variety of information and thoughts in the language of ordinary people. Such data must be processed into a form that can be analyzed quantitatively for an effective understanding of this linguistic reality. In order to do so, the spacing and notation of some vocabulary items were modified to meet current norms, and vocabulary listed in various dictionaries was added to the dictionary referenced by the morphological analyzer to improve vocabulary unit detection. After this pre-processing, changes in linguistic form were investigated to show the application of this corpus. The mean number of syllables in words decreased and the length of the sentences showed a continuous decrease. In addition, the proportion of Chinese characters in articles dropped and the use of Hangul and Alphabets has increased.

(0)

(0)

로딩중