Dual Attention Network for Scene Segmentation

Dual Attention Network for Scene Segmentation

21 Apr 2019 | Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang, Hanqing Lu
This paper proposes a Dual Attention Network (DANet) for scene segmentation, which uses a self-attention mechanism to capture rich contextual dependencies between local features. Unlike previous methods that rely on multi-scale feature fusion, DANet adaptively integrates local features with their global dependencies by appending two attention modules to a dilated FCN: a position attention module and a channel attention module. The position attention module selectively aggregates features at each position based on weighted sums of all positions, while the channel attention module emphasizes interdependent channel maps by integrating associated features across all channels. The outputs of these modules are combined to enhance feature representation, leading to more accurate segmentation results. DANet achieves state-of-the-art performance on three challenging scene segmentation datasets: Cityscapes, PASCAL Context, and COCO Stuff, with a Mean IoU score of 81.5% on the Cityscapes test set without using coarse data. The method is more effective and flexible than previous approaches, particularly in handling complex and diverse scenes. The position attention module improves spatial interdependencies, while the channel attention module enhances channel interdependencies. The model's ability to capture long-range contextual information improves feature representations and semantic consistency. Comprehensive experiments show that DANet outperforms existing methods on multiple datasets, demonstrating its effectiveness in scene segmentation. The method is simple to implement and can be directly integrated into existing FCN pipelines without significantly increasing computational complexity.This paper proposes a Dual Attention Network (DANet) for scene segmentation, which uses a self-attention mechanism to capture rich contextual dependencies between local features. Unlike previous methods that rely on multi-scale feature fusion, DANet adaptively integrates local features with their global dependencies by appending two attention modules to a dilated FCN: a position attention module and a channel attention module. The position attention module selectively aggregates features at each position based on weighted sums of all positions, while the channel attention module emphasizes interdependent channel maps by integrating associated features across all channels. The outputs of these modules are combined to enhance feature representation, leading to more accurate segmentation results. DANet achieves state-of-the-art performance on three challenging scene segmentation datasets: Cityscapes, PASCAL Context, and COCO Stuff, with a Mean IoU score of 81.5% on the Cityscapes test set without using coarse data. The method is more effective and flexible than previous approaches, particularly in handling complex and diverse scenes. The position attention module improves spatial interdependencies, while the channel attention module enhances channel interdependencies. The model's ability to capture long-range contextual information improves feature representations and semantic consistency. Comprehensive experiments show that DANet outperforms existing methods on multiple datasets, demonstrating its effectiveness in scene segmentation. The method is simple to implement and can be directly integrated into existing FCN pipelines without significantly increasing computational complexity.
Reach us at info@study.space