A Discriminatively Trained, Multiscale, Deformable Part Model

A Discriminatively Trained, Multiscale, Deformable Part Model

| Pedro Felzenszwalb, David McAllester, Deva Ramanan
This paper presents a discriminatively trained, multiscale, deformable part model for object detection. The system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge and outperforms the best results in the 2007 challenge in ten out of twenty categories. The system relies heavily on deformable parts and new methods for discriminative training. It combines a margin-sensitive approach for data mining hard negative examples with a formalism called latent SVM. A latent SVM leads to a non-convex training problem, but becomes convex once latent information is specified for the positive examples. The system uses a scanning window approach, with a global "root" filter and several part models. Each part model specifies a spatial model and a part filter. The score of a detection window is the score of the root filter plus the sum over parts of the maximum over placements of that part. The system is trained using a discriminative procedure that only requires bounding box labels for the positive examples. The system is efficient and accurate, processing an image in about 2 seconds and achieving recognition rates significantly better than previous systems. The system uses HOG features at two different scales, with coarse features captured by a rigid template and finer scale features captured by part templates. The system also uses a latent SVM for training, which allows for the effective use of latent information such as hierarchical models and models involving latent three-dimensional pose. The system is evaluated on the PASCAL VOC 2006 and 2007 datasets, achieving high performance on both rigid and highly deformable objects. The system is successful with a large or small amount of training data and is able to detect objects over a wide range of scales and poses. The system also performs well on partially occluded objects. The results show that the system is the current state-of-the-art in object detection.This paper presents a discriminatively trained, multiscale, deformable part model for object detection. The system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge and outperforms the best results in the 2007 challenge in ten out of twenty categories. The system relies heavily on deformable parts and new methods for discriminative training. It combines a margin-sensitive approach for data mining hard negative examples with a formalism called latent SVM. A latent SVM leads to a non-convex training problem, but becomes convex once latent information is specified for the positive examples. The system uses a scanning window approach, with a global "root" filter and several part models. Each part model specifies a spatial model and a part filter. The score of a detection window is the score of the root filter plus the sum over parts of the maximum over placements of that part. The system is trained using a discriminative procedure that only requires bounding box labels for the positive examples. The system is efficient and accurate, processing an image in about 2 seconds and achieving recognition rates significantly better than previous systems. The system uses HOG features at two different scales, with coarse features captured by a rigid template and finer scale features captured by part templates. The system also uses a latent SVM for training, which allows for the effective use of latent information such as hierarchical models and models involving latent three-dimensional pose. The system is evaluated on the PASCAL VOC 2006 and 2007 datasets, achieving high performance on both rigid and highly deformable objects. The system is successful with a large or small amount of training data and is able to detect objects over a wide range of scales and poses. The system also performs well on partially occluded objects. The results show that the system is the current state-of-the-art in object detection.
Reach us at info@study.space
[slides and audio] A discriminatively trained%2C multiscale%2C deformable part model