[slides and audio] CCNet%3A Criss-Cross Attention for Semantic Segmentation

The paper introduces the Criss-Cross Network (CCNet), a novel approach for semantic segmentation that effectively captures full-image contextual information. CCNet uses a criss-cross attention module to aggregate contextual information from all pixels, significantly reducing computational complexity and GPU memory usage compared to non-local blocks. The module is recurrent, allowing each pixel to capture full-image dependencies. A category consistent loss is proposed to enforce the module to produce more discriminative features. Extensive experiments on various benchmarks, including Cityscapes, ADE20K, LIP, COCO, and CamVid, demonstrate that CCNet achieves state-of-the-art performance, outperforming existing methods in terms of mIoU scores. The source code is available at <https://github.com/speedinghzl/CCNet>.The paper introduces the Criss-Cross Network (CCNet), a novel approach for semantic segmentation that effectively captures full-image contextual information. CCNet uses a criss-cross attention module to aggregate contextual information from all pixels, significantly reducing computational complexity and GPU memory usage compared to non-local blocks. The module is recurrent, allowing each pixel to capture full-image dependencies. A category consistent loss is proposed to enforce the module to produce more discriminative features. Extensive experiments on various benchmarks, including Cityscapes, ADE20K, LIP, COCO, and CamVid, demonstrate that CCNet achieves state-of-the-art performance, outperforming existing methods in terms of mIoU scores. The source code is available at <https://github.com/speedinghzl/CCNet>.

CCNet: Criss-Cross Attention for Semantic Segmentation

JULY 2020 | Zilong Huang, Xinggang Wang, Member, IEEE, Yunchao Wei, Lichao Huang, Humphrey Shi, Member, IEEE, Wenyu Liu, Senior Member, IEEE, and Thomas S. Huang, Life Fellow, IEEE