September 2022 | Meng-Hao Guo, Tian-Xing Xu, Jiang-Jiang Liu, Zheng-Ning Liu, Peng-Tao Jiang, Tai-Jiang Mu, Song-Hai Zhang, Ralph R. Martin, Ming-Ming Cheng, and Shi-Min Hu
Attention mechanisms in computer vision have become essential for tasks like image classification, object detection, and semantic segmentation. This survey provides a comprehensive review of various attention mechanisms, categorizing them into channel, spatial, temporal, and branch attention, along with hybrid categories. The paper discusses the evolution of attention mechanisms, from early methods like RAM and STN to modern self-attention approaches. It highlights key works such as SENet, GSoP-Net, SRM, GCT, ECANet, and others, which improve feature selection and global information capture. The survey also covers spatial attention methods like RAM, glimpse networks, hard/soft attention, and STN, which focus on important regions. Self-attention mechanisms, including those used in vision transformers, are discussed for their ability to model global relationships. The paper concludes that attention mechanisms have significant potential to replace convolutional networks and enhance performance in various visual tasks.Attention mechanisms in computer vision have become essential for tasks like image classification, object detection, and semantic segmentation. This survey provides a comprehensive review of various attention mechanisms, categorizing them into channel, spatial, temporal, and branch attention, along with hybrid categories. The paper discusses the evolution of attention mechanisms, from early methods like RAM and STN to modern self-attention approaches. It highlights key works such as SENet, GSoP-Net, SRM, GCT, ECANet, and others, which improve feature selection and global information capture. The survey also covers spatial attention methods like RAM, glimpse networks, hard/soft attention, and STN, which focus on important regions. Self-attention mechanisms, including those used in vision transformers, are discussed for their ability to model global relationships. The paper concludes that attention mechanisms have significant potential to replace convolutional networks and enhance performance in various visual tasks.