뇌의 ‘선택과 집중’ 원리를 모방한 효율적 동영상 인식 AI 연구

김준경; 조영임

doi:10.34139/JSCS.2025.15.4.67

최근 딥러닝 기반의 동영상 인식 기술은 3D CNN과 같은 심층 신경망의 발전에 힘입어 인간을 상회하는 정확도를 달성하고 있다. 그러나 성능 향상을 위해 모델의 크기가 거대해지면서 막대한 연산 비용과 전력 소모를 야기하고 있으며, 내부의 복잡한 추론 과정을 이해하기 어려운 ‘블랙박스(Black-box)’ 문제는 자율주행이나 의료와 같은 고신뢰성 분야로의 적용을 제한하고 있다. 본 연구는 이러한 한계를 극복하기 위해 생물학적 뇌가 보여주는 ‘선택과 집중(Selective Attention)’ 원리와 신경세포의 ‘기능적 분화 (Functional Specialization)’ 기제를 심층 신경망 설계에 적용하여, 지능적 효율성과 설명 가능성을 동시에 확보한 새로운 동영상 인식 모델을 제안한다. 본 연구는 먼저 Hiramoto & Cline(2024)의 선행 연구를 재현하고 올챙이 시개(Optic Tectum)의 신경세포 데이터를 비지도 학습 기법으로 심층 분석하여, 뇌의 뉴런들이 특정 시공간 패턴(정적 배경, 수평 이동, 복합 회전 등)에만 반응하는 ‘전문가 그룹’으로 분화되어 있음을 통계적으로 규명하였다. 이러한 생체 모방 원리를 공학적으로 구현하기 위해, 입력 영상의 동적 복잡도(Dynamic Complexity)를 실시간으로 분석하여 최적의 연산 경로를 능동적으로 선택하는 Spatially Adaptive MovieNet을 설계하였다. 제안 모델의 핵심인 지능형 게이팅 모듈(Intelligent Gating Module)은 영상 내 정보량이 높은 영역을 탐지하는 역할을 수행하며, 확률 기반의 가중합이 아닌 승자 독식(Winner-Takes-All) 메커니즘을 적용하여 2D(정적) 또는 3D(동적) 연산 경로 중 단 하나만을 물리적으로 실행함으로써 실질적인 연산 가속을 구현하였다. 또한, 모델이 데이터의 핵심 특징인 움직임 (Motion)에 집중하도록 유도하기 위해, 어텐션 맵의 희소성을 강제하는 희소성 손실을 포함한 다중 목표 학습 전략을 수립하였다. 표준 3D CNN과의 비교 실험 결과, 제안 모델은 98.84%의 동일한 최고 분류 정확도를 달성하면서도, 파라미터 수를 7.35M에서 0.14M으로 약 98% 감소시켰으며, 실제 연산량 (FLOPs) 또한 0.27G에서 0.22G로 절감하는 성과를 거두었다. 특히 제안 모델은 쉬운 데이터에 대해서는 스스로 가벼운 연산 모드를 선택하는 적응형 능력을 보여주었으며, 희소성 손실을 통해 실제 피사체의 움직임 궤적을 정확히 시각화함으로써 모델의 판단 근거를 명확히 제시하였다. 본 연구는 뇌과학적 통찰을 딥러닝 아키텍처에 접목하여 지능적 효율성(Intelligent Efficiency)을 입증한 사례로서, 향후 저전력 엣지 디바이스를 위한 경량화 AI 연구에 중요한 방향성을 제시한다.

Recent deep learning-based video recognition technologies, driven by advancements in deep neural networks such as 3D CNNs, have achieved superhuman accuracy. However, the increasing scale of these models has led to massive computational costs and power consumption. Furthermore, the "black-box" nature of their complex inference processes limits their application in high-reliability fields like autonomous driving and healthcare. To overcome these limitations, this study proposes a novel video recognition model that secures both intelligent efficiency and explainability by applying the biological brain's principles of 'Selective Attention' and 'Functional Specialization' to deep neural network design.We first replicated the study by Hiramoto & Cline (2024) and conducted an in-depth analysis of neural data from the optic tectum using unsupervised learning techniques. This statistically verified that neurons differentiate into 'expert groups' that respond only to specific spatiotemporal patterns, such as static backgrounds, horizontal movements, or complex rotations. To implement these biomimetic principles, we designed the Spatially Adaptive MovieNet, which actively selects the optimal computational path by analyzing the dynamic complexity of input videos in real-time. The core Intelligent Gating Module detects high-information regions within the video and employs a Winner-Takes-All mechanism to physically execute only one computational path—either 2D (static) or 3D (dynamic)—thereby realizing substantial acceleration instead of using a probability-based weighted sum. Furthermore, a multi-objective learning strategy including Sparsity Loss was established to induce the model to focus on motion, the key feature of the data, by enforcing sparsity in attention maps.Comparative experiments with a Standard 3D CNN demonstrated that the proposed model achieved the same top classification accuracy of 98.84% while reducing the number of parameters by approximately 98% (from 7.35M to 0.14M) and computational cost (FLOPs) from 0.27G to 0.22G. Notably, the proposed model demonstrated adaptive capability by self-selecting lighter computational modes for simple data and clearly presented the rationale for its decisions by accurately visualizing the motion trajectories of the subject through sparsity loss. This study presents a significant direction for future lightweight AI research for low-power edge devices by integrating neuroscientific insights into deep learning architectures to prove Intelligent Efficiency.

뇌의 ‘선택과 집중’ 원리를 모방한 효율적 동영상 인식 AI 연구
Development of Efficient Video Recognition Model Mimicking Brain‘s ’Selection and Concentration‘ Principle

(0)

(0)

(0)

(0)

뇌의 ‘선택과 집중’ 원리를 모방한 효율적 동영상 인식 AI 연구 Development of Efficient Video Recognition Model Mimicking Brain‘s ’Selection and Concentration‘ Principle

(0)

(0) 팝업 열기 팝업 닫기

(0)

(0)

뇌의 ‘선택과 집중’ 원리를 모방한 효율적 동영상 인식 AI 연구
Development of Efficient Video Recognition Model Mimicking Brain‘s ’Selection and Concentration‘ Principle

(0)