[slides and audio] BoxSup%3A Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

The paper "BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation" by Jifeng Dai, Kaiming He, and Jian Sun from Microsoft Research introduces a method called BoxSup that leverages bounding box annotations to train deep convolutional networks for semantic segmentation. The key idea is to iteratively generate region proposals using unsupervised methods and train the network with these proposals, which in turn improve the proposals. This process is repeated to enhance both the network and the proposals. The method achieves competitive results (e.g., 62.0% mAP for validation) with only bounding box annotations, comparable to strong baselines (e.g., 63.8% mAP) fully supervised by pixel-level masks. By utilizing a large number of bounding boxes, BoxSup further enhances the performance of deep networks, achieving state-of-the-art results on PASCAL VOC 2012 and PASCAL-CONTEXT datasets. The paper also includes error analysis, showing that the improvement comes primarily from better object recognition accuracy rather than boundary detection.The paper "BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation" by Jifeng Dai, Kaiming He, and Jian Sun from Microsoft Research introduces a method called BoxSup that leverages bounding box annotations to train deep convolutional networks for semantic segmentation. The key idea is to iteratively generate region proposals using unsupervised methods and train the network with these proposals, which in turn improve the proposals. This process is repeated to enhance both the network and the proposals. The method achieves competitive results (e.g., 62.0% mAP for validation) with only bounding box annotations, comparable to strong baselines (e.g., 63.8% mAP) fully supervised by pixel-level masks. By utilizing a large number of bounding boxes, BoxSup further enhances the performance of deep networks, achieving state-of-the-art results on PASCAL VOC 2012 and PASCAL-CONTEXT datasets. The paper also includes error analysis, showing that the improvement comes primarily from better object recognition accuracy rather than boundary detection.

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

18 May 2015 | Jifeng Dai, Kaiming He, Jian Sun