Interpretable Explanations of Black Boxes by Meaningful Perturbation

Interpretable Explanations of Black Boxes by Meaningful Perturbation

3 Dec 2021 | Ruth C. Fong, Andrea Vedaldi
This paper introduces a general framework for learning interpretable explanations for black box algorithms, such as deep neural networks. The framework is model-agnostic and testable, as it is grounded in explicit and interpretable image perturbations. The main contributions are: (1) proposing a general framework for learning different kinds of explanations for any black box algorithm, and (2) specializing the framework to find the part of an image most responsible for a classifier decision. The framework is tested on image classification tasks, where it is used to identify the regions of an image that are most influential in the classifier's decision. The method is compared to other saliency techniques, such as gradient-based saliency, guided backprop, and Grad-CAM, and is shown to produce more interpretable and accurate results. The paper also discusses the challenges of explaining black box models, including the risk of artifacts and the need for careful calibration of the generality and interpretability of explanations. The method is evaluated on several tasks, including object localization, pointing, and adversarial defense, and is shown to outperform other methods in these tasks. The results demonstrate that the proposed framework provides a principled and effective way to explain black box models.This paper introduces a general framework for learning interpretable explanations for black box algorithms, such as deep neural networks. The framework is model-agnostic and testable, as it is grounded in explicit and interpretable image perturbations. The main contributions are: (1) proposing a general framework for learning different kinds of explanations for any black box algorithm, and (2) specializing the framework to find the part of an image most responsible for a classifier decision. The framework is tested on image classification tasks, where it is used to identify the regions of an image that are most influential in the classifier's decision. The method is compared to other saliency techniques, such as gradient-based saliency, guided backprop, and Grad-CAM, and is shown to produce more interpretable and accurate results. The paper also discusses the challenges of explaining black box models, including the risk of artifacts and the need for careful calibration of the generality and interpretability of explanations. The method is evaluated on several tasks, including object localization, pointing, and adversarial defense, and is shown to outperform other methods in these tasks. The results demonstrate that the proposed framework provides a principled and effective way to explain black box models.
Reach us at info@study.space