5 Dec 2017 | Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam
This paper presents a new approach for semantic image segmentation using atrous convolution, which allows for adjusting the filter's field-of-view and controlling the resolution of feature responses in deep convolutional neural networks (DCNNs). The authors propose a system called DeepLabv3 that significantly improves upon previous versions without dense CRF post-processing and achieves comparable performance with other state-of-the-art models on the PASCAL VOC 2012 benchmark.
The key contributions of the paper include the design of modules that use atrous convolution in cascade or parallel to capture multi-scale context by adopting multiple atrous rates. The authors also propose to augment their previously proposed Atrous Spatial Pyramid Pooling (ASPP) module with image-level features to encode global context and further boost performance. They elaborate on implementation details and share their experience on training the system.
The proposed system, DeepLabv3, uses atrous convolution to extract dense feature maps and capture long-range context. It employs a cascaded module that gradually increases the atrous rates to encode multi-scale information. The ASPP module is augmented with image-level features to probe features with filters at multiple sampling rates and effective field-of-views. The system is trained with a combination of multi-scale inputs, left-right flipped images, and a bootstrapping method for handling rare and finely annotated objects.
The experimental results show that DeepLabv3 significantly improves upon previous DeepLab versions and achieves comparable performance with other state-of-the-art models on the PASCAL VOC 2012 benchmark. The system achieves a performance of 85.7% on the PASCAL VOC 2012 test set without dense CRF post-processing. The authors also show that the system performs well on the Cityscapes dataset, achieving a performance of 81.3% on the test set. The system is also pretrained on the MS-COCO dataset, achieving a performance of 86.9% on the PASCAL VOC 2012 test set.This paper presents a new approach for semantic image segmentation using atrous convolution, which allows for adjusting the filter's field-of-view and controlling the resolution of feature responses in deep convolutional neural networks (DCNNs). The authors propose a system called DeepLabv3 that significantly improves upon previous versions without dense CRF post-processing and achieves comparable performance with other state-of-the-art models on the PASCAL VOC 2012 benchmark.
The key contributions of the paper include the design of modules that use atrous convolution in cascade or parallel to capture multi-scale context by adopting multiple atrous rates. The authors also propose to augment their previously proposed Atrous Spatial Pyramid Pooling (ASPP) module with image-level features to encode global context and further boost performance. They elaborate on implementation details and share their experience on training the system.
The proposed system, DeepLabv3, uses atrous convolution to extract dense feature maps and capture long-range context. It employs a cascaded module that gradually increases the atrous rates to encode multi-scale information. The ASPP module is augmented with image-level features to probe features with filters at multiple sampling rates and effective field-of-views. The system is trained with a combination of multi-scale inputs, left-right flipped images, and a bootstrapping method for handling rare and finely annotated objects.
The experimental results show that DeepLabv3 significantly improves upon previous DeepLab versions and achieves comparable performance with other state-of-the-art models on the PASCAL VOC 2012 benchmark. The system achieves a performance of 85.7% on the PASCAL VOC 2012 test set without dense CRF post-processing. The authors also show that the system performs well on the Cityscapes dataset, achieving a performance of 81.3% on the test set. The system is also pretrained on the MS-COCO dataset, achieving a performance of 86.9% on the PASCAL VOC 2012 test set.