CCNet: Criss-Cross Attention for Semantic Segmentation

CCNet: Criss-Cross Attention for Semantic Segmentation

JULY 2020 | Zilong Huang, Xinggang Wang, Member, IEEE, Yunchao Wei, Lichao Huang, Humphrey Shi, Member, IEEE, Wenyu Liu, Senior Member, IEEE, and Thomas S. Huang, Life Fellow, IEEE
CCNet: Criss-Cross Attention for Semantic Segmentation Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi, Wenyu Liu, and Thomas S. Huang propose a Criss-Cross Network (CCNet) for semantic segmentation, which efficiently captures full-image contextual information. The CCNet introduces a criss-cross attention module that harvests contextual information along criss-cross paths for each pixel. A recurrent operation enables each pixel to capture full-image dependencies. A category consistent loss is introduced to enhance feature discriminability. CCNet offers three main advantages: 1) GPU memory efficiency (11× less than non-local block), 2) high computational efficiency (85% less FLOPs than non-local block), and 3) state-of-the-art performance on benchmarks like Cityscapes, ADE20K, LIP, COCO, and CamVid. CCNet achieves mIoU scores of 81.9%, 45.76%, and 55.47% on these benchmarks, respectively. The CCNet is also extended to 3D for temporal context modeling. The criss-cross attention module is implemented as a graph neural network, with parameters shared between modules. The category consistent loss ensures that pixels of the same category have similar feature vectors while those of different categories have dissimilar vectors. Extensive experiments on multiple datasets show that CCNet achieves top performance on semantic segmentation tasks. The CCNet is efficient, effective, and generalizable, with improvements over previous methods in terms of performance and computational efficiency. The method is applicable to both 2D and 3D tasks, and the results demonstrate its effectiveness in semantic segmentation and instance segmentation.CCNet: Criss-Cross Attention for Semantic Segmentation Zilong Huang, Xinggang Wang, Yunchao Wei, Lichao Huang, Humphrey Shi, Wenyu Liu, and Thomas S. Huang propose a Criss-Cross Network (CCNet) for semantic segmentation, which efficiently captures full-image contextual information. The CCNet introduces a criss-cross attention module that harvests contextual information along criss-cross paths for each pixel. A recurrent operation enables each pixel to capture full-image dependencies. A category consistent loss is introduced to enhance feature discriminability. CCNet offers three main advantages: 1) GPU memory efficiency (11× less than non-local block), 2) high computational efficiency (85% less FLOPs than non-local block), and 3) state-of-the-art performance on benchmarks like Cityscapes, ADE20K, LIP, COCO, and CamVid. CCNet achieves mIoU scores of 81.9%, 45.76%, and 55.47% on these benchmarks, respectively. The CCNet is also extended to 3D for temporal context modeling. The criss-cross attention module is implemented as a graph neural network, with parameters shared between modules. The category consistent loss ensures that pixels of the same category have similar feature vectors while those of different categories have dissimilar vectors. Extensive experiments on multiple datasets show that CCNet achieves top performance on semantic segmentation tasks. The CCNet is efficient, effective, and generalizable, with improvements over previous methods in terms of performance and computational efficiency. The method is applicable to both 2D and 3D tasks, and the results demonstrate its effectiveness in semantic segmentation and instance segmentation.
Reach us at info@study.space
Understanding CCNet%3A Criss-Cross Attention for Semantic Segmentation