14 Feb 2019 | Waqas Sultani, Chen Chen, Mubarak Shah
This paper proposes a deep multiple instance learning (MIL) framework for real-world anomaly detection in surveillance videos. The method leverages weakly labeled training videos, where labels are at the video level rather than clip level, to automatically learn an anomaly ranking model. The framework introduces sparsity and temporal smoothness constraints in the ranking loss function to better localize anomalies during training. A new large-scale dataset of 1900 real-world surveillance videos with 13 types of anomalies is introduced, covering 128 hours of video content. The dataset is used for two tasks: general anomaly detection and specific activity recognition. Experimental results show that the proposed MIL method achieves significant improvements in anomaly detection performance compared to state-of-the-art approaches. The dataset also serves as a challenging benchmark for activity recognition on untrimmed videos due to its complexity and large intra-class variations. The method outperforms existing baselines, including C3D and TCNN, in activity recognition tasks. The proposed approach is robust to false alarms and performs well on normal videos. The method is validated through extensive experiments, demonstrating its effectiveness in detecting anomalies in real-world surveillance videos. The paper also discusses the limitations of existing methods and highlights the importance of using both normal and anomalous videos for training a robust anomaly detection system.This paper proposes a deep multiple instance learning (MIL) framework for real-world anomaly detection in surveillance videos. The method leverages weakly labeled training videos, where labels are at the video level rather than clip level, to automatically learn an anomaly ranking model. The framework introduces sparsity and temporal smoothness constraints in the ranking loss function to better localize anomalies during training. A new large-scale dataset of 1900 real-world surveillance videos with 13 types of anomalies is introduced, covering 128 hours of video content. The dataset is used for two tasks: general anomaly detection and specific activity recognition. Experimental results show that the proposed MIL method achieves significant improvements in anomaly detection performance compared to state-of-the-art approaches. The dataset also serves as a challenging benchmark for activity recognition on untrimmed videos due to its complexity and large intra-class variations. The method outperforms existing baselines, including C3D and TCNN, in activity recognition tasks. The proposed approach is robust to false alarms and performs well on normal videos. The method is validated through extensive experiments, demonstrating its effectiveness in detecting anomalies in real-world surveillance videos. The paper also discusses the limitations of existing methods and highlights the importance of using both normal and anomalous videos for training a robust anomaly detection system.