DECEMBER 2009 | Markus Enzweiler, Student Member, IEEE, and Dariu M. Gavrila
This paper provides a comprehensive survey and experimental study of monocular pedestrian detection systems. The authors cover the main components of a pedestrian detection system, including hypothesis generation, classification, and tracking, and evaluate a diverse set of state-of-the-art systems using a large-scale benchmark dataset. The experimental study includes four approaches: Haar wavelet-based AdaBoost cascade, HOG/linSVM, neural network using local receptive fields (NN/LRF), and combined shape-texture detection. The evaluation is conducted in two scenarios: a generic evaluation focusing on performance and processing speed, and an application-specific evaluation for pedestrian detection from a moving vehicle. The results indicate that HOG/linSVM performs well at higher image resolutions and lower processing speeds, while the wavelet-based AdaBoost cascade is superior at lower resolutions and near real-time processing speeds. The benchmark dataset, which includes thousands of training samples and a 27-minute test sequence with over 20,000 images, is made publicly available for further research.This paper provides a comprehensive survey and experimental study of monocular pedestrian detection systems. The authors cover the main components of a pedestrian detection system, including hypothesis generation, classification, and tracking, and evaluate a diverse set of state-of-the-art systems using a large-scale benchmark dataset. The experimental study includes four approaches: Haar wavelet-based AdaBoost cascade, HOG/linSVM, neural network using local receptive fields (NN/LRF), and combined shape-texture detection. The evaluation is conducted in two scenarios: a generic evaluation focusing on performance and processing speed, and an application-specific evaluation for pedestrian detection from a moving vehicle. The results indicate that HOG/linSVM performs well at higher image resolutions and lower processing speeds, while the wavelet-based AdaBoost cascade is superior at lower resolutions and near real-time processing speeds. The benchmark dataset, which includes thousands of training samples and a 27-minute test sequence with over 20,000 images, is made publicly available for further research.