한국어 학습자의 어휘 사용 분석과 특성 선정을 통한 숙달도 예측 모델 개발 연구

허원진; 김재욱

doi:10.22251/jlcci.2024.24.16.903

목적 한국어 학습자의 어휘 사용 양상을 분석하고, 이를 기반으로 숙달도 예측 모델을 개발하는 것이다. 방법 이를 위하여 국립국어원에서 제공하는 학습자 형태주석 구어 말뭉치를 활용하여 어휘 사용 양상을 분석하고, 랜덤 포레스트 알고리즘을 통해 예측 모델을 학습 및 평가하였다. 결과 랜덤 포레스트 기반 예측 모델은 72.28%의 정확도와 16.33%의 평균 절대 오차(MAPE)를 보이며, 학습자의 어휘 사용 양상을 효과적으로 반영하고 숙달도를 예측하는 데 유효함을 확인하였다. 특히 ‘전체 어휘 구현 수’, ‘전체 어휘 유형 수’, ‘문장 개수’, ‘내용어 구현 수’, ‘내용어 유형 수’, ‘n급 어휘 유형 수’, ‘어휘 밀도1, 어휘 밀도2’를 포함한 조합이 가장 높은 성능을 보였다. 결론 어휘 구현 수, 유형 수, 문장 개수 등의 자질을 포함한 랜덤 포레스트 모델은 학습자의 어휘 사용 양상을 기반으로 한국어 숙달도를 높은 정확도로 예측할 수 있으며, 이는 자동화된 학습자 평가 시스템 개발에 기여할 수 있다.

Objectives The objective is to analyse the vocabulary usage of Korean learners with the intention of developing a proficiency prediction model based on the findings. Methods To this end, we analysed vocabulary usage patterns using the learner morphological annotation spoken corpus provided by the National Language Institute of Korea, and trained and evaluated a prediction model using a random forest algorithm. Results The Random Forest-based prediction model demonstrated a 72.28% accuracy rate and a 16.33% mean absolute percentage error (MAPE), effectively reflecting learners' vocabulary usage and predicting mastery. In particular, the combination of variables including ‘total number of lexical implementations’, ‘total number of lexical types’, ‘number of sentences’, ‘number of content word implementations’, ‘number of content word types’, ‘number of n-class lexical types’, and ‘lexical density 1 and 2’ demonstrated the most favourable performance. Conclusions In conclusion, a random forest model comprising variables such as the number of lexical implementations, the number of types, and the number of sentences can accurately predict Korean language proficiency based on learners' lexical usage patterns. This can contribute to the development of automated learner assessment systems.

한국어 학습자의 어휘 사용 분석과 특성 선정을 통한 숙달도 예측 모델 개발 연구
Research on analysing Korean learners' vocabulary usage and selecting features to develop a proficiency prediction model

(0)

(0)

(0)

(0)

한국어 학습자의 어휘 사용 분석과 특성 선정을 통한 숙달도 예측 모델 개발 연구 Research on analysing Korean learners' vocabulary usage and selecting features to develop a proficiency prediction model

(0)

(0) 팝업 열기 팝업 닫기

(0)

(0)

한국어 학습자의 어휘 사용 분석과 특성 선정을 통한 숙달도 예측 모델 개발 연구
Research on analysing Korean learners' vocabulary usage and selecting features to develop a proficiency prediction model

(0)