Dynamic Subtitle Allocation: A Practical System with Speaker Separation
- 한국인공지능학회
- 인공지능연구
- Vol.13 No. 1
-
2025.0327 - 34 (8 pages)
-
DOI : 10.24225/kjai.2025.13.1.27
- 48
Subtitles help viewers understand the contents of a video by displaying spoken words or their translations. However, when multiple speakers speak simultaneously, it becomes difficult to determine who is speaking through subtitles alone. Additionally, most subtitles are positioned below the scene, which can disrupt visual focus. To address these issues, we develop a dynamic subtitle allocation pipeline that positions subtitles near the speaker’s face. A speaker’s face is cropped using YOLOv3-based face detection, followed by a lip-reading model associating ground-truth subtitles with the correct speaker. Experimental results show that our method effectively assigns subtitles to the corresponding speakers, even in multi-speaker scenarios, demonstrating its robustness in handling complex audio-visual conditions. This practical approach enhances accessibility, particularly for viewers with hearing impairments, and improves the viewing experience by ensuring seamless subtitle synchronization with speakers. Furthermore, our system operates with standard single-channel audio and requires no additional pre-processing, making it easy to integrate into existing content and deploy in real-world applications.
1. Introduction
2. Related Research
3. Methodology
4. Results
5. Conclusion and Future Work
References
(0)
(0)