10 Jun 2024 | Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer, Aakriti Jain, Thomas Wiegand, Sebastian Lapuschkin, Wojciech Samek
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
This paper introduces AttnLRP, an extension of Layer-wise Relevance Propagation (LRP) tailored for transformer models. The goal is to provide faithful and efficient explanations for both input and latent representations of transformers. The method addresses the challenge of attributing non-linear components in transformer architectures, which are difficult to explain with existing methods. AttnLRP is designed to handle attention layers and other non-linear operations, enabling accurate attribution with computational efficiency similar to a single backward pass.
The method is evaluated on several transformer-based models, including LLaMa 2, Mixtral 8x7b, Flan-T5, and vision transformers. Results show that AttnLRP outperforms existing methods in terms of faithfulness and enables the understanding of latent representations. The approach allows for the identification of relevant neurons and their role in the generation process, facilitating concept-based explanations.
AttnLRP is implemented as an open-source library, providing a practical tool for interpreting transformer models. The method is particularly effective in vision transformers, where it addresses issues of noise and provides more accurate attributions. The approach is also efficient in terms of computational complexity and memory consumption, making it suitable for large-scale models.
The paper also discusses the limitations of the method, including the need for careful tuning of parameters and the impact of quantization on attributions. The work contributes to the field of explainable AI by providing a robust framework for understanding and debugging transformer-based models, which are increasingly used in critical applications such as healthcare and finance. The high computational efficiency of AttnLRP reduces energy usage and environmental impact, promoting broader adoption of explainable AI for transformers.AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
This paper introduces AttnLRP, an extension of Layer-wise Relevance Propagation (LRP) tailored for transformer models. The goal is to provide faithful and efficient explanations for both input and latent representations of transformers. The method addresses the challenge of attributing non-linear components in transformer architectures, which are difficult to explain with existing methods. AttnLRP is designed to handle attention layers and other non-linear operations, enabling accurate attribution with computational efficiency similar to a single backward pass.
The method is evaluated on several transformer-based models, including LLaMa 2, Mixtral 8x7b, Flan-T5, and vision transformers. Results show that AttnLRP outperforms existing methods in terms of faithfulness and enables the understanding of latent representations. The approach allows for the identification of relevant neurons and their role in the generation process, facilitating concept-based explanations.
AttnLRP is implemented as an open-source library, providing a practical tool for interpreting transformer models. The method is particularly effective in vision transformers, where it addresses issues of noise and provides more accurate attributions. The approach is also efficient in terms of computational complexity and memory consumption, making it suitable for large-scale models.
The paper also discusses the limitations of the method, including the need for careful tuning of parameters and the impact of quantization on attributions. The work contributes to the field of explainable AI by providing a robust framework for understanding and debugging transformer-based models, which are increasingly used in critical applications such as healthcare and finance. The high computational efficiency of AttnLRP reduces energy usage and environmental impact, promoting broader adoption of explainable AI for transformers.