Understanding A discriminatively trained%2C multiscale%2C deformable part model

This paper presents a discriminatively trained, multiscale, deformable part model for object detection. The system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge and outperforms the best results in the 2007 challenge in ten out of twenty categories. The model combines a margin-sensitive approach for mining hard negative examples with a latent SVM, which is semi-convex and can be solved using coordinate descent. The system uses a scanning window approach, with a global "root" filter and multiple part models, each specifying a spatial model and a part filter. The score of a detection window is the sum of the scores from the root and part filters, adjusted for deformation costs. The training process reduces the problem to binary classification, treating part locations and root positions as latent variables. The system is efficient, processing images in about 2 seconds and achieving high recognition rates. Experimental results on the PASCAL VOC 2006 and 2007 datasets demonstrate the system's effectiveness on both rigid and deformable objects, with high performance even with limited training data. The paper also discusses the impact of spatial models and allowable deformations on detection accuracy.This paper presents a discriminatively trained, multiscale, deformable part model for object detection. The system achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge and outperforms the best results in the 2007 challenge in ten out of twenty categories. The model combines a margin-sensitive approach for mining hard negative examples with a latent SVM, which is semi-convex and can be solved using coordinate descent. The system uses a scanning window approach, with a global "root" filter and multiple part models, each specifying a spatial model and a part filter. The score of a detection window is the sum of the scores from the root and part filters, adjusted for deformation costs. The training process reduces the problem to binary classification, treating part locations and root positions as latent variables. The system is efficient, processing images in about 2 seconds and achieving high recognition rates. Experimental results on the PASCAL VOC 2006 and 2007 datasets demonstrate the system's effectiveness on both rigid and deformable objects, with high performance even with limited training data. The paper also discusses the impact of spatial models and allowable deformations on detection accuracy.

A Discriminatively Trained, Multiscale, Deformable Part Model

| Pedro Felzenszwalb, David McAllester, Deva Ramanan