Game Tag Extraction Model for Korean Game Classification

이한결; 이재욱; 박소영

doi:10.21493/kscg.2016.29.2.1

최근 소프트웨어 유통망(ESD)를 통해 유통되는 게임이 늘어남에 따라 사용자들이 원하는 게임을 찾는데 어려움을 겪고 있다. 이에 따라 사용자들이 원하는 게임을 찾기 쉽게 태그를 생성하는 모델의 필요성이 대두되고 있다. 본 논문에서는 태그를 생성하는 모델을 BERT를 통해 설계하였다. 태그 100개 중 가장 적합한 태그를 4개 추출하기 위해 입력된 문장에 대해 각 태그별로 이진분류를 수행하고 이진분류 당시의 Softmax값이 가장 컸던 태그 4개를 선택했다. 또한, 모델의 정확도를 위해서 약 33억 개의 다국어 단어로 학습한 pre-trained Multilingual BERT 모델과 약 5천만 개의 한국어 단어로 학습한 KoBERT 모델을 가져와 한국어 데이터로 학습(finetuning) 시켜 사용하였다. 실험에서 BERT 모델은 KoBERT 모델보다 F- 점수에서 9.19 % 더 나은 성능을 보입니다. 이는 언어 학습 데이터 세트의 크기가 특정 언어인 한국어 특성보다 더 중요하다는 것을 나타낸다.

As the number of games increases in the software distribution network (ESD), it is difficult to find the game that a user wants. Therefore, the game can be recommended based on some game keyword tags for the user. In this paper, we propose a method to automatically generate the game keyword tags from the game description with the deep learning model, BERT. To generate the appropriate game keyword tags, the proposed method extracts the 100 representative game keyword tags from a game publishing platform Steam, and it performs the binary classification per tag. Finally, it selects 4 game keyword tags with the highest Softmax scores. Considering the accuracy improvement, a Korean game description set is used for finetuning and optimization, so that it updates both the BERT model pretrained with approximately 3.3 billion multilingual words, and the KoBERT model pretrained with approximately 50 million Korean words. Experiments show that the BERT model performs 9.19 % at F-score better than the KoBERT model. It describes that the size of the training data set is much more important than the characteristics of the specific language.

Game Tag Extraction Model for Korean Game Classification

(0)

(0)