Understanding Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

This paper proposes DeepLabv3+, an encoder-decoder model for semantic image segmentation that combines the advantages of spatial pyramid pooling and encoder-decoder structures. The model extends DeepLabv3 by adding a simple yet effective decoder module to refine segmentation results, especially along object boundaries. It also incorporates depthwise separable convolution into both the Atrous Spatial Pyramid Pooling (ASPP) and decoder modules, resulting in a faster and stronger encoder-decoder network. The model is evaluated on the PASCAL VOC 2012 and Cityscapes datasets, achieving test set performance of 89.0% and 82.1% without any post-processing. The model is implemented in TensorFlow and made publicly available. Key contributions include the proposed encoder-decoder structure, the ability to control encoder feature resolution via atrous convolution, and the adaptation of the Xception model with depthwise separable convolution. The model demonstrates improved performance in terms of both speed and accuracy, and achieves state-of-the-art results on the benchmark datasets.This paper proposes DeepLabv3+, an encoder-decoder model for semantic image segmentation that combines the advantages of spatial pyramid pooling and encoder-decoder structures. The model extends DeepLabv3 by adding a simple yet effective decoder module to refine segmentation results, especially along object boundaries. It also incorporates depthwise separable convolution into both the Atrous Spatial Pyramid Pooling (ASPP) and decoder modules, resulting in a faster and stronger encoder-decoder network. The model is evaluated on the PASCAL VOC 2012 and Cityscapes datasets, achieving test set performance of 89.0% and 82.1% without any post-processing. The model is implemented in TensorFlow and made publicly available. Key contributions include the proposed encoder-decoder structure, the ability to control encoder feature resolution via atrous convolution, and the adaptation of the Xception model with depthwise separable convolution. The model demonstrates improved performance in terms of both speed and accuracy, and achieves state-of-the-art results on the benchmark datasets.

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

22 Aug 2018 | Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam