12 Jan 2024 | Stefan Blücher, Johanna Vielhaben, Nils Strodthoff
This study addresses the disagreement problem in pixel flipping (PF) benchmarks for explainable AI (XAI), which arises due to the ambiguity in occlusion strategies and the choice between removing the most influential features (MIF) or least influential features (LIF). The authors propose two complementary perspectives to resolve this issue:
1. **Reliability of Occlusion Strategies**: They introduce the Reference-out-of-model-scope (R-OMS) score to quantify the reliability of occluded samples, enabling a systematic comparison of different occlusion strategies. This score measures how much information about the original sample is still contained in the occluded samples as perceived by the model.
2. **Consistency of PF Benchmarks**: They show that the insightfulness of MIF and LIF rankings is conversely dependent on the R-OMS score. To leverage this, they propose the Symmetric Relevance Gain (SRG) measure, which combines both MIF and LIF measures. The SRG measure breaks the inherent connection to the underlying occlusion strategy, leading to consistent rankings across all occlusion strategies.
The study also provides a detailed analysis of various occlusion strategies, including different imputers, superpixel shapes, and model choices. It demonstrates that the diffusion imputer consistently yields the highest R-OMS scores, indicating reliable occlusion strategies. The SRG measure is shown to be both consistent and quantitatively stable, allowing for trustworthy PF benchmarks of XAI methods. The results highlight the importance of reliable and insightful occlusion strategies in XAI evaluations and provide a framework for improving the comparability of future studies in XAI research.This study addresses the disagreement problem in pixel flipping (PF) benchmarks for explainable AI (XAI), which arises due to the ambiguity in occlusion strategies and the choice between removing the most influential features (MIF) or least influential features (LIF). The authors propose two complementary perspectives to resolve this issue:
1. **Reliability of Occlusion Strategies**: They introduce the Reference-out-of-model-scope (R-OMS) score to quantify the reliability of occluded samples, enabling a systematic comparison of different occlusion strategies. This score measures how much information about the original sample is still contained in the occluded samples as perceived by the model.
2. **Consistency of PF Benchmarks**: They show that the insightfulness of MIF and LIF rankings is conversely dependent on the R-OMS score. To leverage this, they propose the Symmetric Relevance Gain (SRG) measure, which combines both MIF and LIF measures. The SRG measure breaks the inherent connection to the underlying occlusion strategy, leading to consistent rankings across all occlusion strategies.
The study also provides a detailed analysis of various occlusion strategies, including different imputers, superpixel shapes, and model choices. It demonstrates that the diffusion imputer consistently yields the highest R-OMS scores, indicating reliable occlusion strategies. The SRG measure is shown to be both consistent and quantitatively stable, allowing for trustworthy PF benchmarks of XAI methods. The results highlight the importance of reliable and insightful occlusion strategies in XAI evaluations and provide a framework for improving the comparability of future studies in XAI research.