A Study on Learning Method for Korean Speech Data Using Limited Computing Resource

한국인공지능학회
인공지능연구
Vol.13 No. 2
2025.06

17 - 21 (5 pages)
DOI : 10.24225/kjai.2025.13.2.17

In light of the increasing concerns over carbon emissions and power supply issues in the field of artificial intelligence, this study aims to conduct fine-tuning of a large language model (LLM) on Korean spoken language data using small-scale computing resources, and to evaluate the performance of the resulting supervised model. This research proposes an efficient method to limit computing resource usage and conducts the training based on such limited infrastructure.Subsequently, Korean spoken language data was collected. The dataset was designed to enable the model to understand a wide range of questions and provide appropriate answers. It consists of general knowledge sentence generation data, book summary information, academic paper summary data, and document summarization data. Due to the phonological changes, frequent subject omission, and honorifics that are unique to the Korean language, it is difficult to achieve satisfactory performance using existing English-based LLM training methods alone.This study distinguishes itself from prior works by selectively leveraging a dataset that reflects the linguistic characteristics of Korean, thereby proposing a language-specialized fine-tuning data strategy. For methodology, we conducted LLM fine-tuning using LoRA (Low-Rank Adaptation of Large Language Models) via Unsloth, based on the open-source Llama-3.1-8B-Instruct AI model. As a result, the model fine-tuning in this study achieved an average score of 43.33 on the Open Ko-LLM Leaderboard. Notably, it scored 61.17 on Ko-Winogrande, which assesses logical reasoning, and 58.3 on Ko-GSM8k, which evaluates mathematical problem-solving skills—demonstrating competitive performance compared to other open-source models. These results suggest a practical alternative to large-scale resource-based models in terms of both resource efficiency and linguistic suitability

1. Introduction

2. Related Work

3. Computing Resources and Data

4. Tokenizer Configure Method

5. Conclusion

References

A Study on Learning Method for Korean Speech Data Using Limited Computing Resource

(0)

(0)

(0)

(0)

A Study on Learning Method for Korean Speech Data Using Limited Computing Resource

(0)

(0) 팝업 열기 팝업 닫기

(0)

(0)

(0)