Ensemble of Exemplar-SVMs for Object Detection and Beyond

Ensemble of Exemplar-SVMs for Object Detection and Beyond

| Tomasz Malisiewicz, Abhinav Gupta, Alexei A. Efros
This paper proposes a simple yet powerful method for object detection that combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. The method involves training a separate linear SVM classifier for each exemplar in the training set. Each Exemplar-SVM is defined by a single positive instance and millions of negative examples. While each detector is specific to its exemplar, an ensemble of such Exemplar-SVMs shows surprisingly good generalization. The method achieves performance comparable to complex latent part-based models on the PASCAL VOC detection task with only a modest computational cost increase. The key benefit is the explicit association between each detection and a training exemplar, enabling the transfer of meta-data such as segmentation, geometry, and 3D models directly onto the detections. The motivation for this approach stems from the difficulty of handling large amounts of negative data in object detection. Traditional methods like Dalal-Triggs and Felzenszwalb use data mining to find hard negatives and train discriminative classifiers. However, these methods often assume that all positive examples of a category are related, which is not always true. The proposed method addresses this by using non-parametric representation for positives and parametric representation for negatives, allowing for better generalization. The approach involves training a separate classifier for each exemplar, using a rigid HOG template. Each classifier is discriminatively trained to separate the exemplar from negative examples. The method uses a calibration step to adjust the scores of the classifiers, ensuring that they are comparable and robust to variations in exemplar quality. The calibration process involves using a logistic function to fit the scores based on the overlap between detections and ground-truth bounding boxes. The method is evaluated on the PASCAL VOC 2007 dataset, showing competitive performance with state-of-the-art methods. It also demonstrates the ability to transfer high-quality meta-data such as segmentation, geometry, and 3D models onto detections. The method is effective in tasks like object detection, segmentation, geometry estimation, and 3D model transfer, showing that the explicit association between detections and exemplars enables a variety of applications beyond basic detection.This paper proposes a simple yet powerful method for object detection that combines the effectiveness of a discriminative object detector with the explicit correspondence offered by a nearest-neighbor approach. The method involves training a separate linear SVM classifier for each exemplar in the training set. Each Exemplar-SVM is defined by a single positive instance and millions of negative examples. While each detector is specific to its exemplar, an ensemble of such Exemplar-SVMs shows surprisingly good generalization. The method achieves performance comparable to complex latent part-based models on the PASCAL VOC detection task with only a modest computational cost increase. The key benefit is the explicit association between each detection and a training exemplar, enabling the transfer of meta-data such as segmentation, geometry, and 3D models directly onto the detections. The motivation for this approach stems from the difficulty of handling large amounts of negative data in object detection. Traditional methods like Dalal-Triggs and Felzenszwalb use data mining to find hard negatives and train discriminative classifiers. However, these methods often assume that all positive examples of a category are related, which is not always true. The proposed method addresses this by using non-parametric representation for positives and parametric representation for negatives, allowing for better generalization. The approach involves training a separate classifier for each exemplar, using a rigid HOG template. Each classifier is discriminatively trained to separate the exemplar from negative examples. The method uses a calibration step to adjust the scores of the classifiers, ensuring that they are comparable and robust to variations in exemplar quality. The calibration process involves using a logistic function to fit the scores based on the overlap between detections and ground-truth bounding boxes. The method is evaluated on the PASCAL VOC 2007 dataset, showing competitive performance with state-of-the-art methods. It also demonstrates the ability to transfer high-quality meta-data such as segmentation, geometry, and 3D models onto detections. The method is effective in tasks like object detection, segmentation, geometry estimation, and 3D model transfer, showing that the explicit association between detections and exemplars enables a variety of applications beyond basic detection.
Reach us at info@study.space