24 Jun 2017 | Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller
This paper provides an overview of techniques for interpreting deep neural networks (DNNs), focusing on methods to understand and explain their predictions. It discusses various approaches, including activation maximization, sensitivity analysis, Taylor decomposition, and layer-wise relevance propagation (LRP). The paper emphasizes the importance of interpretability in applications such as medicine and autonomous vehicles, where model reliability is critical. It also highlights the shift from simple models to complex ones, and how advanced interpretation techniques can now explain even the most complex models. The paper introduces the concept of interpretation as mapping abstract concepts to interpretable domains, and explanation as identifying features contributing to a decision. It presents methods for interpreting DNNs, such as activation maximization with expert models, and techniques for explaining decisions, including sensitivity analysis, Taylor decomposition, and LRP. The paper also discusses practical considerations, such as using pooling and filtering to manage large input spaces, and provides recommendations for implementing LRP, including choosing appropriate layers and propagation rules. It concludes with applications of these techniques in model validation and scientific data analysis.This paper provides an overview of techniques for interpreting deep neural networks (DNNs), focusing on methods to understand and explain their predictions. It discusses various approaches, including activation maximization, sensitivity analysis, Taylor decomposition, and layer-wise relevance propagation (LRP). The paper emphasizes the importance of interpretability in applications such as medicine and autonomous vehicles, where model reliability is critical. It also highlights the shift from simple models to complex ones, and how advanced interpretation techniques can now explain even the most complex models. The paper introduces the concept of interpretation as mapping abstract concepts to interpretable domains, and explanation as identifying features contributing to a decision. It presents methods for interpreting DNNs, such as activation maximization with expert models, and techniques for explaining decisions, including sensitivity analysis, Taylor decomposition, and LRP. The paper also discusses practical considerations, such as using pooling and filtering to manage large input spaces, and provides recommendations for implementing LRP, including choosing appropriate layers and propagation rules. It concludes with applications of these techniques in model validation and scientific data analysis.