ELA: Efficient Local Attention for Deep Convolutional Neural Networks

ELA: Efficient Local Attention for Deep Convolutional Neural Networks

2 Mar 2024 | Wei Xu and Yi Wan
This paper introduces an Efficient Local Attention (ELA) method for deep convolutional neural networks (CNNs) to enhance performance without reducing channel dimensions. ELA addresses limitations of existing attention mechanisms, such as the Coordinate Attention (CA) method, which suffers from reduced channel dimensions and poor generalization due to Batch Normalization (BN). ELA uses 1D convolution and Group Normalization (GN) to efficiently encode spatial information, enabling accurate localization of regions of interest without channel reduction. The method is designed to be lightweight and integrates seamlessly into CNNs like ResNet, MobileNet, and DeepLab. ELA is implemented with three hyperparameters, resulting in four versions: ELA-T, ELA-B, ELA-S, and ELA-L, tailored for different visual tasks. Experimental results on ImageNet, MSCOCO, and Pascal VOC datasets show that ELA outperforms state-of-the-art attention methods in image classification, object detection, and semantic segmentation. ELA achieves significant performance improvements with minimal parameter increase, demonstrating its efficiency and effectiveness. The method is validated through visualization and comparison with other attention modules, showing superior performance in localization and accuracy. The paper concludes that ELA is a lightweight, effective attention mechanism that enhances CNN performance without compromising channel dimensions.This paper introduces an Efficient Local Attention (ELA) method for deep convolutional neural networks (CNNs) to enhance performance without reducing channel dimensions. ELA addresses limitations of existing attention mechanisms, such as the Coordinate Attention (CA) method, which suffers from reduced channel dimensions and poor generalization due to Batch Normalization (BN). ELA uses 1D convolution and Group Normalization (GN) to efficiently encode spatial information, enabling accurate localization of regions of interest without channel reduction. The method is designed to be lightweight and integrates seamlessly into CNNs like ResNet, MobileNet, and DeepLab. ELA is implemented with three hyperparameters, resulting in four versions: ELA-T, ELA-B, ELA-S, and ELA-L, tailored for different visual tasks. Experimental results on ImageNet, MSCOCO, and Pascal VOC datasets show that ELA outperforms state-of-the-art attention methods in image classification, object detection, and semantic segmentation. ELA achieves significant performance improvements with minimal parameter increase, demonstrating its efficiency and effectiveness. The method is validated through visualization and comparison with other attention modules, showing superior performance in localization and accuracy. The paper concludes that ELA is a lightweight, effective attention mechanism that enhances CNN performance without compromising channel dimensions.
Reach us at info@study.space