한의학 고문헌 텍스트 분석을 위한 비지도학습 기반 단어 추출 방법 비교
Comparison of Word Extraction Methods Based on Unsupervised Learning for Analyzing East Asian Traditional Medicine Texts
- 오준호(Oh Jun Ho)
- 32권 3호
- 등재여부 : KCI등재
- 47 - 57 (11 pages)
Objectives : We aim to assist in choosing an appropriate method for word extraction when analyzing East Asian Traditional Medical texts based on unsupervised learning. Methods : In order to assign ranks to substrings, we conducted a test using one method(BE:Branching Entropy) for exterior boundary value, three methods(CS:cohesion score, TS:t-score, SL:simple-ll) for interior boundary value, and six methods(BExSL, BExTS, BExCS, CSxTS, CSxSL, TSxSL) from combining them. Results : When Miss Rate(MR) was used as the criterion, the error was minimal when the TS and SL were used together, while the error was maximum when CS was used alone. When number of segmented texts was applied as weight value, the results were the best in the case of SL, and the worst in the case of BE alone. Conclusions : Unsupervised-Learning-Based Word Extraction is a method that can be used to analyze texts without a prepared set of vocabulary data. When using this method, SL or the combination of SL and TS could be considered primarily.