2009 | Piotr Dollár, Christian Wojek, Bernt Schiele, Pietro Perona
The Caltech Pedestrian Dataset is introduced as a large-scale benchmark for pedestrian detection, containing 350,000 labeled pedestrian bounding boxes in 250,000 video frames. It is two orders of magnitude larger than existing datasets and includes challenging images with low resolution and frequent occlusions. The dataset provides rich annotations, including occlusion information, and is collected from a vehicle driving through urban environments. The authors propose improved evaluation metrics that demonstrate the limitations of commonly used per-window measures, which can fail to predict performance on full images. They benchmark several detection systems, providing an overview of state-of-the-art performance and a direct comparison of existing methods. The dataset also highlights situations where current methods fail, helping to identify future research directions.
The dataset includes statistics on pedestrian scale, occlusion, and position, which are more representative of real-world applications. Pedestrians are categorized into three scales: near (80 pixels or taller), medium (30-80 pixels), and far (30 pixels or less). The dataset also includes occlusion statistics, showing that over 70% of pedestrians are occluded in at least one frame. Position statistics show that pedestrians are typically located in a narrow band across the center of the image.
The dataset is split into training and testing data, with three scenarios for evaluation. The authors emphasize the importance of using per-image evaluation metrics, which are more representative of real-world applications than per-window metrics. They evaluate seven promising pedestrian detectors, finding that HOG and MultiFtr perform well, while others perform poorly. The results show that current methods struggle with detecting pedestrians at smaller scales, under partial occlusion, and with atypical aspect ratios.
The authors conclude that the Caltech Pedestrian Dataset provides a challenging benchmark for pedestrian detection, highlighting the need for research into detection at smaller scales and of partially occluded pedestrians. They also emphasize the importance of using per-image evaluation metrics and suggest future work on extending the benchmark to explore more issues. The dataset and evaluation code are available on the project website.The Caltech Pedestrian Dataset is introduced as a large-scale benchmark for pedestrian detection, containing 350,000 labeled pedestrian bounding boxes in 250,000 video frames. It is two orders of magnitude larger than existing datasets and includes challenging images with low resolution and frequent occlusions. The dataset provides rich annotations, including occlusion information, and is collected from a vehicle driving through urban environments. The authors propose improved evaluation metrics that demonstrate the limitations of commonly used per-window measures, which can fail to predict performance on full images. They benchmark several detection systems, providing an overview of state-of-the-art performance and a direct comparison of existing methods. The dataset also highlights situations where current methods fail, helping to identify future research directions.
The dataset includes statistics on pedestrian scale, occlusion, and position, which are more representative of real-world applications. Pedestrians are categorized into three scales: near (80 pixels or taller), medium (30-80 pixels), and far (30 pixels or less). The dataset also includes occlusion statistics, showing that over 70% of pedestrians are occluded in at least one frame. Position statistics show that pedestrians are typically located in a narrow band across the center of the image.
The dataset is split into training and testing data, with three scenarios for evaluation. The authors emphasize the importance of using per-image evaluation metrics, which are more representative of real-world applications than per-window metrics. They evaluate seven promising pedestrian detectors, finding that HOG and MultiFtr perform well, while others perform poorly. The results show that current methods struggle with detecting pedestrians at smaller scales, under partial occlusion, and with atypical aspect ratios.
The authors conclude that the Caltech Pedestrian Dataset provides a challenging benchmark for pedestrian detection, highlighting the need for research into detection at smaller scales and of partially occluded pedestrians. They also emphasize the importance of using per-image evaluation metrics and suggest future work on extending the benchmark to explore more issues. The dataset and evaluation code are available on the project website.