상세검색
최근 검색어 전체 삭제
다국어입력
즐겨찾기0
학술저널

Dynamic Subtitle Allocation: A Practical System with Speaker Separation

  • 48
인공지능연구(KJAI) Vol.13 No. 1.jpg

Subtitles help viewers understand the contents of a video by displaying spoken words or their translations. However, when multiple speakers speak simultaneously, it becomes difficult to determine who is speaking through subtitles alone. Additionally, most subtitles are positioned below the scene, which can disrupt visual focus. To address these issues, we develop a dynamic subtitle allocation pipeline that positions subtitles near the speaker’s face. A speaker’s face is cropped using YOLOv3-based face detection, followed by a lip-reading model associating ground-truth subtitles with the correct speaker. Experimental results show that our method effectively assigns subtitles to the corresponding speakers, even in multi-speaker scenarios, demonstrating its robustness in handling complex audio-visual conditions. This practical approach enhances accessibility, particularly for viewers with hearing impairments, and improves the viewing experience by ensuring seamless subtitle synchronization with speakers. Furthermore, our system operates with standard single-channel audio and requires no additional pre-processing, making it easy to integrate into existing content and deploy in real-world applications.

1. Introduction

2. Related Research

3. Methodology

4. Results

5. Conclusion and Future Work

References

(0)

(0)

로딩중