23 Mar 2018 | Hang Zhang, Kristin Dana, Jianping Shi, Zhongyue Zhang, Xiaogang Wang, Ambrish Tyagi, Amit Agrawal
This paper introduces a Context Encoding Module for semantic segmentation, which captures global contextual information to improve segmentation performance. The module selectively highlights class-dependent featuremaps, enhancing the network's ability to understand scene context. The proposed Context Encoding Module significantly improves semantic segmentation results with minimal additional computation, achieving state-of-the-art results on PASCAL-Context (51.7% mIoU) and PASCAL VOC 2012 (85.9% mIoU). A single model of EncNet-101 achieves a score of 0.5567 on the ADE20K test set, surpassing the winning entry of the COCO-Place Challenge 2017. The module also improves image classification performance on the CIFAR-10 dataset, achieving a 3.45% error rate with a 14-layer ResNet. The Context Encoding Module is compatible with existing FCN-based approaches and is lightweight, making it effective for both semantic segmentation and visual recognition. The module uses semantic encoding loss to regularize training and improve the network's understanding of global context. The proposed Context Encoding Network (EncNet) is built on top of a pre-trained ResNet and incorporates the Context Encoding Module, achieving superior performance in various benchmarks. The module's effectiveness is demonstrated through extensive experiments on multiple datasets, showing its potential for improving scene parsing and semantic segmentation tasks.This paper introduces a Context Encoding Module for semantic segmentation, which captures global contextual information to improve segmentation performance. The module selectively highlights class-dependent featuremaps, enhancing the network's ability to understand scene context. The proposed Context Encoding Module significantly improves semantic segmentation results with minimal additional computation, achieving state-of-the-art results on PASCAL-Context (51.7% mIoU) and PASCAL VOC 2012 (85.9% mIoU). A single model of EncNet-101 achieves a score of 0.5567 on the ADE20K test set, surpassing the winning entry of the COCO-Place Challenge 2017. The module also improves image classification performance on the CIFAR-10 dataset, achieving a 3.45% error rate with a 14-layer ResNet. The Context Encoding Module is compatible with existing FCN-based approaches and is lightweight, making it effective for both semantic segmentation and visual recognition. The module uses semantic encoding loss to regularize training and improve the network's understanding of global context. The proposed Context Encoding Network (EncNet) is built on top of a pre-trained ResNet and incorporates the Context Encoding Module, achieving superior performance in various benchmarks. The module's effectiveness is demonstrated through extensive experiments on multiple datasets, showing its potential for improving scene parsing and semantic segmentation tasks.