Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network

Large Kernel Matters —— Improve Semantic Segmentation by Global Convolutional Network

8 Mar 2017 | Chao Peng, Xiangyu Zhang, Gang Yu, Guiming Luo, Jian Sun
This paper proposes a Global Convolutional Network (GCN) to improve semantic segmentation by addressing both classification and localization challenges simultaneously. The GCN is designed to retain localization performance through a fully convolutional structure without using fully-connected or global pooling layers, which could lose localization information. Additionally, large kernel sizes are adopted to enable dense connections between feature maps and per-pixel classifiers, enhancing the model's ability to handle different transformations. The GCN is combined with a Boundary Refinement (BR) block to further refine object boundaries. The BR block models boundary alignment as a residual structure, improving localization performance near object boundaries. The proposed approach achieves state-of-the-art performance on two public benchmarks: 82.2% on the PASCAL VOC 2012 dataset and 76.9% on the Cityscapes dataset, significantly outperforming previous results. The GCN is implemented using a ResNet-152 backbone, with the first two layers replaced by a GCN module. The GCN structure is compared with other methods, including a simple 1x1 convolution baseline and stacked small convolutions. The results show that the GCN outperforms these methods, especially for larger kernel sizes. The BR block further improves performance near object boundaries. The approach is evaluated on the PASCAL VOC 2012 and Cityscapes datasets, achieving high performance on both. The GCN-based model outperforms previous state-of-the-art results, demonstrating the effectiveness of the proposed method in semantic segmentation. The model is also tested on the Cityscapes dataset, achieving 76.9% on the test set, which is the new state-of-the-art result. The GCN is shown to be effective in both pretrained models and segmentation-specific structures.This paper proposes a Global Convolutional Network (GCN) to improve semantic segmentation by addressing both classification and localization challenges simultaneously. The GCN is designed to retain localization performance through a fully convolutional structure without using fully-connected or global pooling layers, which could lose localization information. Additionally, large kernel sizes are adopted to enable dense connections between feature maps and per-pixel classifiers, enhancing the model's ability to handle different transformations. The GCN is combined with a Boundary Refinement (BR) block to further refine object boundaries. The BR block models boundary alignment as a residual structure, improving localization performance near object boundaries. The proposed approach achieves state-of-the-art performance on two public benchmarks: 82.2% on the PASCAL VOC 2012 dataset and 76.9% on the Cityscapes dataset, significantly outperforming previous results. The GCN is implemented using a ResNet-152 backbone, with the first two layers replaced by a GCN module. The GCN structure is compared with other methods, including a simple 1x1 convolution baseline and stacked small convolutions. The results show that the GCN outperforms these methods, especially for larger kernel sizes. The BR block further improves performance near object boundaries. The approach is evaluated on the PASCAL VOC 2012 and Cityscapes datasets, achieving high performance on both. The GCN-based model outperforms previous state-of-the-art results, demonstrating the effectiveness of the proposed method in semantic segmentation. The model is also tested on the Cityscapes dataset, achieving 76.9% on the test set, which is the new state-of-the-art result. The GCN is shown to be effective in both pretrained models and segmentation-specific structures.
Reach us at info@study.space
[slides] Large Kernel Matters %E2%80%94 Improve Semantic Segmentation by Global Convolutional Network | StudySpace