Graph Neural Network Explanations are Fragile

Graph Neural Network Explanations are Fragile

2024 | Jiate Li, Meng Pang, Yun Dong, Jinyuan Jia, Binghui Wang
This paper investigates the fragility of explainable graph neural networks (GNNs) under adversarial attacks. We focus on perturbation-based GNN explainers, which aim to identify the subgraph that best explains the model's prediction. Our study reveals that an adversary can manipulate the graph structure to maintain accurate predictions while drastically altering the explanation results. We propose two attack methods: a loss-based attack and a deduction-based attack. The loss-based attack identifies important edges by analyzing how changes in edge presence affect the explanation loss. The deduction-based attack simulates the learning process of the GNN explainer to identify edges that significantly influence the explanation. Our experiments show that existing GNN explainers are vulnerable to these attacks. For instance, perturbing only 2 edges can lead to a 70% difference in the explanatory edges. The generated perturbations are also effective against other types of GNN explainers, demonstrating the generalizability of our attacks. Our attacks are practical, stealthy, and faithful, as they maintain the graph structure and ensure accurate predictions. We evaluate our attacks on multiple graph datasets and GNN tasks, showing that the attack performance is significantly better than random and kill-hot baselines. The results highlight the need for more robust GNN explainers that can withstand adversarial attacks. Our findings raise concerns about the reliability of GNNs in safety-critical applications, such as disease diagnosis and malware detection. Future work includes designing attacks against non-perturbation-based GNN explainers and developing provably robust GNN explainers.This paper investigates the fragility of explainable graph neural networks (GNNs) under adversarial attacks. We focus on perturbation-based GNN explainers, which aim to identify the subgraph that best explains the model's prediction. Our study reveals that an adversary can manipulate the graph structure to maintain accurate predictions while drastically altering the explanation results. We propose two attack methods: a loss-based attack and a deduction-based attack. The loss-based attack identifies important edges by analyzing how changes in edge presence affect the explanation loss. The deduction-based attack simulates the learning process of the GNN explainer to identify edges that significantly influence the explanation. Our experiments show that existing GNN explainers are vulnerable to these attacks. For instance, perturbing only 2 edges can lead to a 70% difference in the explanatory edges. The generated perturbations are also effective against other types of GNN explainers, demonstrating the generalizability of our attacks. Our attacks are practical, stealthy, and faithful, as they maintain the graph structure and ensure accurate predictions. We evaluate our attacks on multiple graph datasets and GNN tasks, showing that the attack performance is significantly better than random and kill-hot baselines. The results highlight the need for more robust GNN explainers that can withstand adversarial attacks. Our findings raise concerns about the reliability of GNNs in safety-critical applications, such as disease diagnosis and malware detection. Future work includes designing attacks against non-perturbation-based GNN explainers and developing provably robust GNN explainers.
Reach us at info@study.space
Understanding Graph Neural Network Explanations are Fragile