Dynamic Subtitle Allocation: A Practical System with Speaker Separation

한국인공지능학회
인공지능연구
Vol.13 No. 1
2025.03

27 - 34 (8 pages)
DOI : 10.24225/kjai.2025.13.1.27

Subtitles help viewers understand the contents of a video by displaying spoken words or their translations. However, when multiple speakers speak simultaneously, it becomes difficult to determine who is speaking through subtitles alone. Additionally, most subtitles are positioned below the scene, which can disrupt visual focus. To address these issues, we develop a dynamic subtitle allocation pipeline that positions subtitles near the speaker’s face. A speaker’s face is cropped using YOLOv3-based face detection, followed by a lip-reading model associating ground-truth subtitles with the correct speaker. Experimental results show that our method effectively assigns subtitles to the corresponding speakers, even in multi-speaker scenarios, demonstrating its robustness in handling complex audio-visual conditions. This practical approach enhances accessibility, particularly for viewers with hearing impairments, and improves the viewing experience by ensuring seamless subtitle synchronization with speakers. Furthermore, our system operates with standard single-channel audio and requires no additional pre-processing, making it easy to integrate into existing content and deploy in real-world applications.

1. Introduction

2. Related Research

3. Methodology

4. Results

5. Conclusion and Future Work

References

Dynamic Subtitle Allocation: A Practical System with Speaker Separation

(0)

(0)

(0)

(0)

Dynamic Subtitle Allocation: A Practical System with Speaker Separation

(0)

(0) 팝업 열기 팝업 닫기

(0)

(0)

(0)