차분 특징을 이용한 평균-교사 모델의 음향 이벤트 검출 성능 향상

곽진열; 정용주

doi:10.13067/JKIECS.2021.16.3.401

최근 들어, 음향 이벤트 검출을 위하여 CRNN(: Convolutional Recurrent Neural Network) 구조에 기반 한 평균-교사 모델이 대표적으로 사용되고 있다. 평균-교사 모델은 두 개의 병렬 형태의 CRNN을 가진 구조이며, 이들의 출력들의 일치성을 학습 기준으로 사용함으로서 약-전사 레이블(label)과 비-전사 레이블 음향 데이터에 대해서도 효과적인 학습이 가능하다. 본 연구에서는 최신의 평균-교사 모델에 로그-멜 스펙트럼에 대한 차분 특징을 추가적으로 사용함으로서 보다 나은 성능을 이루고자 하였다. DCASE 2018/2019 Challenge Task 4용 학습 및 테스트 데이터를 이용한 음향 이벤트 검출 실험에서 제안된 차분특징을 이용한 평균-교사모델은 기존의 방식에 비해서 최대 8.1%의 상대적 ER(: Error Rate)의 향상을 얻을 수 있었다.

Recently, mean-teacher models based on convolutional recurrent neural networks are popularly used in audio event detection. The mean-teacher model is an architecture that consists of two parallel CRNNs and it is possible to train them effectively on the weakly-labelled and unlabeled audio data by using the consistency learning metric at the output of the two neural networks. In this study, we tried to improve the performance of the mean-teacher model by using additional derivative features of the log-mel spectrum. In the audio event detection experiments using the training and test data from the Task 4 of the DCASE 2018/2019 Challenges, we could obtain maximally a 8.1% relative decrease in the ER(Error Rate) in the mean-teacher model using proposed derivative features.

차분 특징을 이용한 평균-교사 모델의 음향 이벤트 검출 성능 향상
Performance Improvement of Mean-Teacher Models in Audio Event Detection Using Derivative Features

(0)

(0)

(0)

(0)

차분 특징을 이용한 평균-교사 모델의 음향 이벤트 검출 성능 향상 Performance Improvement of Mean-Teacher Models in Audio Event Detection Using Derivative Features

(0)

(0)

(0)

(0)

차분 특징을 이용한 평균-교사 모델의 음향 이벤트 검출 성능 향상
Performance Improvement of Mean-Teacher Models in Audio Event Detection Using Derivative Features