5 January 2024 | Jae-Hyuk Park, Mohamed Mahmoud, Hyun-Soo Kang
This paper introduces a Conv3D-based video violence detection network that integrates optical flow and RGB data to enhance the accuracy and efficiency of violence detection in real-time surveillance systems. The proposed model, based on ResNet-3D, leverages optical flow to capture spatiotemporal features and RGB data to provide spatial context, improving the understanding of violent scenarios. An attention mechanism is integrated to focus on crucial frames during violent events, enhancing the model's performance. The model was evaluated on four datasets (UBI-Fight, Hockey, Crowd, and Movie-Fights) and outperformed existing state-of-the-art techniques, achieving high area under the curve scores. The research highlights the potential of this approach in real-time surveillance and contributes to broader research in video analysis and understanding.This paper introduces a Conv3D-based video violence detection network that integrates optical flow and RGB data to enhance the accuracy and efficiency of violence detection in real-time surveillance systems. The proposed model, based on ResNet-3D, leverages optical flow to capture spatiotemporal features and RGB data to provide spatial context, improving the understanding of violent scenarios. An attention mechanism is integrated to focus on crucial frames during violent events, enhancing the model's performance. The model was evaluated on four datasets (UBI-Fight, Hockey, Crowd, and Movie-Fights) and outperformed existing state-of-the-art techniques, achieving high area under the curve scores. The research highlights the potential of this approach in real-time surveillance and contributes to broader research in video analysis and understanding.