11 January 2024 | Ghayth Almahadin1 · Maheswari Subburaj2 · Mohammad Hiari3 · Saranya Sathasivam Singaram4 · Bhanu Prakash Kolla5 · Pankaj Dadheech6 · Amol D. Vibhute7 · Sudhakar Sengan8
This paper addresses the challenge of identifying suspicious activities or behaviors in crowded scenes, where inter-object occlusions complicate anomaly detection. The authors propose a novel approach that combines spatio-temporal autoencoders and convolutional LSTM networks to enhance the accuracy and efficiency of anomaly detection in video sequences. By leveraging both spatial and temporal dimensions, the method effectively captures intricate motion patterns and spatial information within continuous video frames. The objective is to create a comprehensive model that can detect and locate anomalies in complex video sequences, particularly those featuring human crowds. The efficacy of the proposed model will be evaluated using a benchmark dataset that simulates real-world conditions, where millions of video footages need to be monitored in real time. The research highlights the limitations of traditional Supervised Learning (SL) methods in handling the massive volume of data and the need for rapid anomaly detection. The paper is structured into sections covering the introduction, related works, proposed methodology, results and discussions, and conclusions, maintaining clarity and comprehensibility.This paper addresses the challenge of identifying suspicious activities or behaviors in crowded scenes, where inter-object occlusions complicate anomaly detection. The authors propose a novel approach that combines spatio-temporal autoencoders and convolutional LSTM networks to enhance the accuracy and efficiency of anomaly detection in video sequences. By leveraging both spatial and temporal dimensions, the method effectively captures intricate motion patterns and spatial information within continuous video frames. The objective is to create a comprehensive model that can detect and locate anomalies in complex video sequences, particularly those featuring human crowds. The efficacy of the proposed model will be evaluated using a benchmark dataset that simulates real-world conditions, where millions of video footages need to be monitored in real time. The research highlights the limitations of traditional Supervised Learning (SL) methods in handling the massive volume of data and the need for rapid anomaly detection. The paper is structured into sections covering the introduction, related works, proposed methodology, results and discussions, and conclusions, maintaining clarity and comprehensibility.