Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation

Weakly- and Semi-Supervised Learning of a Deep Convolutional Network for Semantic Image Segmentation

5 Oct 2015 | George Papandreou*, Liang-Chieh Chen*, Kevin Murphy, Alan L. Yuille
This paper addresses the challenging problem of learning Deep Convolutional Neural Networks (DCNNs) for semantic image segmentation from weakly or semi-supervised data. The authors develop Expectation-Maximization (EM) methods to train DCNNs using either weak annotations such as bounding boxes or image-level labels, or a combination of few strongly labeled and many weakly labeled images. Extensive experiments on the PASCAL VOC 2012 dataset demonstrate that the proposed techniques can achieve competitive results with significantly less annotation effort compared to fully supervised methods. The EM algorithms are shown to perform well in both weakly supervised and semi-supervised settings, nearly matching the performance of fully supervised models. Additionally, combining annotations from multiple datasets, such as PASCAL VOC and MS-COCO, further improves performance, achieving a mean intersection-over-union (IOU) score of 73.9\% on the PASCAL VOC 2012 benchmark. The paper also includes a detailed analysis of the effect of Field-Of-View (FOV) in the network architecture and provides qualitative segmentation results.This paper addresses the challenging problem of learning Deep Convolutional Neural Networks (DCNNs) for semantic image segmentation from weakly or semi-supervised data. The authors develop Expectation-Maximization (EM) methods to train DCNNs using either weak annotations such as bounding boxes or image-level labels, or a combination of few strongly labeled and many weakly labeled images. Extensive experiments on the PASCAL VOC 2012 dataset demonstrate that the proposed techniques can achieve competitive results with significantly less annotation effort compared to fully supervised methods. The EM algorithms are shown to perform well in both weakly supervised and semi-supervised settings, nearly matching the performance of fully supervised models. Additionally, combining annotations from multiple datasets, such as PASCAL VOC and MS-COCO, further improves performance, achieving a mean intersection-over-union (IOU) score of 73.9\% on the PASCAL VOC 2012 benchmark. The paper also includes a detailed analysis of the effect of Field-Of-View (FOV) in the network architecture and provides qualitative segmentation results.
Reach us at info@study.space