July 10, 2015 | Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, Wojciech Samek
This paper introduces a method for pixel-wise explanation of non-linear classifier decisions using layer-wise relevance propagation (LRP). The goal is to understand how individual pixels contribute to the final classification decision. The method is applicable to both kernel-based classifiers using Bag of Words (BoW) features and multilayer neural networks. The approach decomposes the classifier's decision into contributions from individual pixels, which can be visualized as heatmaps to help human experts interpret the model's decisions.
The paper presents a general framework for pixel-wise decomposition, which is applied to both BoW models and neural networks. For BoW models, the method decomposes the classifier's decision into contributions from individual feature dimensions, which are then further decomposed into contributions from local image features and finally into contributions from individual pixels. For neural networks, the method uses layer-wise relevance propagation to propagate relevance scores from the output layer back to the input layer, ensuring that the total relevance is preserved across layers.
The method is evaluated on several datasets, including PASCAL VOC 2009 images, synthetic image data, the MNIST handwritten digits dataset, and the pre-trained ImageNet model. The results show that the method effectively identifies the regions of an image that are most relevant to the classifier's decision, providing insights into the model's behavior.
The paper also discusses the limitations of the method, including the need for a root point in the Taylor decomposition approach and the potential for non-linearities to affect the relevance scores. The authors propose that the method can be extended to a wide range of architectures without the need for approximation by means of Taylor expansion.
Overall, the paper provides a general solution to the problem of understanding and interpreting the decisions of non-linear classifiers, with a focus on pixel-wise explanations that can be used to improve the interpretability of machine learning models.This paper introduces a method for pixel-wise explanation of non-linear classifier decisions using layer-wise relevance propagation (LRP). The goal is to understand how individual pixels contribute to the final classification decision. The method is applicable to both kernel-based classifiers using Bag of Words (BoW) features and multilayer neural networks. The approach decomposes the classifier's decision into contributions from individual pixels, which can be visualized as heatmaps to help human experts interpret the model's decisions.
The paper presents a general framework for pixel-wise decomposition, which is applied to both BoW models and neural networks. For BoW models, the method decomposes the classifier's decision into contributions from individual feature dimensions, which are then further decomposed into contributions from local image features and finally into contributions from individual pixels. For neural networks, the method uses layer-wise relevance propagation to propagate relevance scores from the output layer back to the input layer, ensuring that the total relevance is preserved across layers.
The method is evaluated on several datasets, including PASCAL VOC 2009 images, synthetic image data, the MNIST handwritten digits dataset, and the pre-trained ImageNet model. The results show that the method effectively identifies the regions of an image that are most relevant to the classifier's decision, providing insights into the model's behavior.
The paper also discusses the limitations of the method, including the need for a root point in the Taylor decomposition approach and the potential for non-linearities to affect the relevance scores. The authors propose that the method can be extended to a wide range of architectures without the need for approximation by means of Taylor expansion.
Overall, the paper provides a general solution to the problem of understanding and interpreting the decisions of non-linear classifiers, with a focus on pixel-wise explanations that can be used to improve the interpretability of machine learning models.