Understanding Conv3D-Based Video Violence Detection Network Using Optical Flow and RGB Data

This paper introduces a Conv3D-based video violence detection network that integrates optical flow and RGB data to enhance the accuracy and efficiency of detecting violent behavior in videos. The model uses a ResNet-3D architecture with an attention mechanism to focus on critical frames during violent events. The approach leverages optical flow to capture dynamic movements and RGB data to understand static contexts, combining both to comprehensively analyze video content. The model was evaluated on four datasets: UBI-FIGHT, Hockey, Crowd, and Movie-Fights, achieving high area under the curve (AUC) scores of 95.4, 98.1, 94.5, and 100.0, respectively. The results demonstrate that the proposed method outperforms existing state-of-the-art techniques in detecting violent scenarios. The model is designed for real-time surveillance systems and has potential applications in broader video analysis and understanding. The key contributions include extracting motion information using optical flow, understanding visual contexts with RGB data, and developing a model that captures the relationship between integrated RGB and optical-flow frames. The study highlights the effectiveness of combining optical flow and RGB data with a Conv3D-based ResNet-3D model and an attention module to improve the detection of violent behavior in diverse environments.This paper introduces a Conv3D-based video violence detection network that integrates optical flow and RGB data to enhance the accuracy and efficiency of detecting violent behavior in videos. The model uses a ResNet-3D architecture with an attention mechanism to focus on critical frames during violent events. The approach leverages optical flow to capture dynamic movements and RGB data to understand static contexts, combining both to comprehensively analyze video content. The model was evaluated on four datasets: UBI-FIGHT, Hockey, Crowd, and Movie-Fights, achieving high area under the curve (AUC) scores of 95.4, 98.1, 94.5, and 100.0, respectively. The results demonstrate that the proposed method outperforms existing state-of-the-art techniques in detecting violent scenarios. The model is designed for real-time surveillance systems and has potential applications in broader video analysis and understanding. The key contributions include extracting motion information using optical flow, understanding visual contexts with RGB data, and developing a model that captures the relationship between integrated RGB and optical-flow frames. The study highlights the effectiveness of combining optical flow and RGB data with a Conv3D-based ResNet-3D model and an attention module to improve the detection of violent behavior in diverse environments.

Conv3D-Based Video Violence Detection Network Using Optical Flow and RGB Data

5 January 2024 | Jae-Hyuk Park, Mohamed Mahmoud, Hyun-Soo Kang