14 Dec 2015 | Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio Torralba
This paper revisits the global average pooling (GAP) layer in convolutional neural networks (CNNs) and demonstrates its ability to enable remarkable localization capabilities despite being trained on image-level labels. The authors show that GAP builds a generic localizable deep representation that can be applied to various tasks, achieving a top-5 error rate of 37.1% for object localization on the ILSVRC 2014 dataset, which is close to the 34.2% achieved by fully supervised CNNs. They introduce Class Activation Mapping (CAM), a technique that combines GAP with class activation maps to identify discriminative image regions in a single forward pass. This approach not only improves classification performance but also enhances localization accuracy, even for tasks the network was not originally trained for. The paper also explores the generic localization ability of the deep features learned by GAP CNNs, demonstrating their effectiveness in fine-grained recognition and pattern discovery tasks. Overall, the work highlights the potential of GAP and CAM for improving the localization capabilities of CNNs and providing insights into the internal representations of these networks.This paper revisits the global average pooling (GAP) layer in convolutional neural networks (CNNs) and demonstrates its ability to enable remarkable localization capabilities despite being trained on image-level labels. The authors show that GAP builds a generic localizable deep representation that can be applied to various tasks, achieving a top-5 error rate of 37.1% for object localization on the ILSVRC 2014 dataset, which is close to the 34.2% achieved by fully supervised CNNs. They introduce Class Activation Mapping (CAM), a technique that combines GAP with class activation maps to identify discriminative image regions in a single forward pass. This approach not only improves classification performance but also enhances localization accuracy, even for tasks the network was not originally trained for. The paper also explores the generic localization ability of the deep features learned by GAP CNNs, demonstrating their effectiveness in fine-grained recognition and pattern discovery tasks. Overall, the work highlights the potential of GAP and CAM for improving the localization capabilities of CNNs and providing insights into the internal representations of these networks.