5 Dec 2017 | Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam
This paper revisits the use of atrous convolution in semantic image segmentation, a powerful tool for adjusting the field-of-view and controlling feature resolution in Deep Convolutional Neural Networks (DCNNs). The authors design modules that employ atrous convolution in cascade or parallel to capture multi-scale context by using multiple atrous rates. They also propose an augmented version of the Atrous Spatial Pyramid Pooling (ASPP) module, which includes image-level features to boost performance. The proposed 'DeepLabv3' system significantly improves over previous versions without DenseCRF post-processing and achieves comparable performance to state-of-the-art models on the PASCAL VOC 2012 benchmark. The paper discusses implementation details, training protocols, and experimental results, including a bootstrapping method for handling rare and finely annotated objects.This paper revisits the use of atrous convolution in semantic image segmentation, a powerful tool for adjusting the field-of-view and controlling feature resolution in Deep Convolutional Neural Networks (DCNNs). The authors design modules that employ atrous convolution in cascade or parallel to capture multi-scale context by using multiple atrous rates. They also propose an augmented version of the Atrous Spatial Pyramid Pooling (ASPP) module, which includes image-level features to boost performance. The proposed 'DeepLabv3' system significantly improves over previous versions without DenseCRF post-processing and achieves comparable performance to state-of-the-art models on the PASCAL VOC 2012 benchmark. The paper discusses implementation details, training protocols, and experimental results, including a bootstrapping method for handling rare and finely annotated objects.