Attention is not Explanation

Attention is not Explanation

8 May 2019 | Sarthak Jain, Byron C. Wallace
Attention mechanisms are widely used in neural NLP models, but they do not necessarily provide meaningful explanations for model predictions. This paper investigates whether attention weights correlate with feature importance measures and whether alternative attention distributions can yield equivalent predictions. The results show that attention weights are often uncorrelated with gradient-based feature importance measures and that alternative attention distributions can produce the same predictions as original ones. This suggests that standard attention modules do not provide meaningful explanations for model predictions. The paper also demonstrates that adversarial attention distributions can be constructed that yield effectively equivalent predictions, indicating that attention weights should not be treated as explanations. The findings challenge the common assumption that attention provides insight into model behavior and highlight the limitations of using attention weights to explain model predictions. The paper concludes that attention mechanisms, while useful for improving predictive performance, do not provide meaningful explanations for model predictions and should not be treated as such.Attention mechanisms are widely used in neural NLP models, but they do not necessarily provide meaningful explanations for model predictions. This paper investigates whether attention weights correlate with feature importance measures and whether alternative attention distributions can yield equivalent predictions. The results show that attention weights are often uncorrelated with gradient-based feature importance measures and that alternative attention distributions can produce the same predictions as original ones. This suggests that standard attention modules do not provide meaningful explanations for model predictions. The paper also demonstrates that adversarial attention distributions can be constructed that yield effectively equivalent predictions, indicating that attention weights should not be treated as explanations. The findings challenge the common assumption that attention provides insight into model behavior and highlight the limitations of using attention weights to explain model predictions. The paper concludes that attention mechanisms, while useful for improving predictive performance, do not provide meaningful explanations for model predictions and should not be treated as such.
Reach us at info@study.space
[slides] Attention is not Explanation | StudySpace