Simultaneous Detection and Segmentation

Simultaneous Detection and Segmentation

7 Jul 2014 | Bharath Hariharan, Pablo Arbeláez, Ross Girshick, and Jitendra Malik
This paper introduces a method for simultaneous detection and segmentation (SDS), which aims to detect all instances of a category in an image and mark the pixels that belong to each instance. Unlike classical bounding box detection, SDS requires segmentation, not just a box. Unlike classical semantic segmentation, SDS requires individual object instances. The proposed method builds on recent work using convolutional neural networks (CNNs) for region proposals, introducing a novel architecture tailored for SDS. It uses category-specific, top-down figure-ground predictions to refine bottom-up proposals. The method achieves a 7-point boost (16% relative) over baselines on SDS, a 5-point boost (10% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Diagnostic tools are provided to analyze error modes in SDS. The method uses MCG to generate region candidates, extracts features from both the bounding box and region foreground, and trains a CNN to classify regions. It then performs region classification and refinement to improve segmentation. The method is evaluated on PASCAL VOC 2012, achieving a 49.5% AP^r and improving AP^b from 51.0% (R-CNN) to 53.0%. The method also improves the state-of-the-art in semantic segmentation from 47.9% to 52.6%. The method is publicly available at http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sds.This paper introduces a method for simultaneous detection and segmentation (SDS), which aims to detect all instances of a category in an image and mark the pixels that belong to each instance. Unlike classical bounding box detection, SDS requires segmentation, not just a box. Unlike classical semantic segmentation, SDS requires individual object instances. The proposed method builds on recent work using convolutional neural networks (CNNs) for region proposals, introducing a novel architecture tailored for SDS. It uses category-specific, top-down figure-ground predictions to refine bottom-up proposals. The method achieves a 7-point boost (16% relative) over baselines on SDS, a 5-point boost (10% relative) over state-of-the-art on semantic segmentation, and state-of-the-art performance in object detection. Diagnostic tools are provided to analyze error modes in SDS. The method uses MCG to generate region candidates, extracts features from both the bounding box and region foreground, and trains a CNN to classify regions. It then performs region classification and refinement to improve segmentation. The method is evaluated on PASCAL VOC 2012, achieving a 49.5% AP^r and improving AP^b from 51.0% (R-CNN) to 53.0%. The method also improves the state-of-the-art in semantic segmentation from 47.9% to 52.6%. The method is publicly available at http://www.eecs.berkeley.edu/Research/Projects/CS/vision/shape/sds.
Reach us at info@study.space