3 Dec 2019 | Ramprasaath R. Selvaraju · Michael Cogswell · Abhishek Das · Ramakrishna Vedantam · Devi Parikh · Dhruv Batra
The paper introduces Grad-CAM, a technique for generating visual explanations from Convolutional Neural Network (CNN) models, making them more transparent and explainable. Grad-CAM uses gradients to produce coarse localization maps that highlight important regions in an image for predicting a target concept. Unlike previous methods, Grad-CAM is applicable to a wide range of CNN models without requiring architectural changes or retraining. The authors combine Grad-CAM with existing fine-grained visualizations to create Guided Grad-CAM, which provides high-resolution, class-discriminative visualizations. They evaluate Grad-CAM for image classification, image captioning, and visual question answering (VQA) models, demonstrating its effectiveness in understanding model failures, robustness to adversarial perturbations, and ability to identify dataset biases. Human studies show that Guided Grad-CAM helps users trust the models and discern between 'stronger' and 'weaker' networks. The paper also discusses the generalization of Grad-CAM to different CNN architectures and its application to ResNet-based models.The paper introduces Grad-CAM, a technique for generating visual explanations from Convolutional Neural Network (CNN) models, making them more transparent and explainable. Grad-CAM uses gradients to produce coarse localization maps that highlight important regions in an image for predicting a target concept. Unlike previous methods, Grad-CAM is applicable to a wide range of CNN models without requiring architectural changes or retraining. The authors combine Grad-CAM with existing fine-grained visualizations to create Guided Grad-CAM, which provides high-resolution, class-discriminative visualizations. They evaluate Grad-CAM for image classification, image captioning, and visual question answering (VQA) models, demonstrating its effectiveness in understanding model failures, robustness to adversarial perturbations, and ability to identify dataset biases. Human studies show that Guided Grad-CAM helps users trust the models and discern between 'stronger' and 'weaker' networks. The paper also discusses the generalization of Grad-CAM to different CNN architectures and its application to ResNet-based models.