This paper proposes a Residual Attention Network (RAN) for image classification, which integrates an attention mechanism into a convolutional neural network (CNN) to enhance feature learning. The RAN is built by stacking multiple Attention Modules that generate attention-aware features. These features adaptively change as the network depth increases. Each Attention Module uses a bottom-up top-down feedforward structure to combine feedforward and feedback attention processes into a single feedforward process. Additionally, the paper introduces attention residual learning to train very deep RANs, enabling them to be scaled up to hundreds of layers.
The RAN is evaluated on CIFAR-10, CIFAR-100, and ImageNet datasets, achieving state-of-the-art performance. On CIFAR-10, the RAN achieves 3.90% error, on CIFAR-100 20.45% error, and on ImageNet 4.8% top-5 error. The RAN outperforms ResNet-200 with 0.6% top-1 accuracy improvement, 46% trunk depth, and 69% forward FLOPs. The network is also robust to noisy labels.
The RAN's key contributions include a stacked network structure that enables mixed attention mechanisms, attention residual learning for deep network training, and a bottom-up top-down feedforward attention mechanism. The network's design allows for end-to-end training and is effective in capturing different types of attention. The RAN's performance is validated through extensive experiments on various datasets, demonstrating its effectiveness in image classification tasks. The network's ability to handle noisy labels and its efficiency in parameter usage and computational complexity make it a promising approach for future research in deep learning.This paper proposes a Residual Attention Network (RAN) for image classification, which integrates an attention mechanism into a convolutional neural network (CNN) to enhance feature learning. The RAN is built by stacking multiple Attention Modules that generate attention-aware features. These features adaptively change as the network depth increases. Each Attention Module uses a bottom-up top-down feedforward structure to combine feedforward and feedback attention processes into a single feedforward process. Additionally, the paper introduces attention residual learning to train very deep RANs, enabling them to be scaled up to hundreds of layers.
The RAN is evaluated on CIFAR-10, CIFAR-100, and ImageNet datasets, achieving state-of-the-art performance. On CIFAR-10, the RAN achieves 3.90% error, on CIFAR-100 20.45% error, and on ImageNet 4.8% top-5 error. The RAN outperforms ResNet-200 with 0.6% top-1 accuracy improvement, 46% trunk depth, and 69% forward FLOPs. The network is also robust to noisy labels.
The RAN's key contributions include a stacked network structure that enables mixed attention mechanisms, attention residual learning for deep network training, and a bottom-up top-down feedforward attention mechanism. The network's design allows for end-to-end training and is effective in capturing different types of attention. The RAN's performance is validated through extensive experiments on various datasets, demonstrating its effectiveness in image classification tasks. The network's ability to handle noisy labels and its efficiency in parameter usage and computational complexity make it a promising approach for future research in deep learning.