25 Nov 2019 | Arjun Nitin Bhagoji*, Supriyo Chakraborty, Prateek Mittal, and Seraphin Calo
This paper analyzes the vulnerability of federated learning (FL) to model poisoning attacks, where a malicious agent manipulates its local model updates to cause the global model to misclassify targeted inputs with high confidence. The authors explore various strategies for carrying out such attacks, including boosting the malicious agent's update to overcome the effects of other agents' updates, using alternating minimization to optimize for both training loss and adversarial objectives, and leveraging parameter estimation to improve attack success. They also investigate the stealthiness of these attacks by analyzing the global model's performance on validation data and the statistical properties of weight updates. The results show that even a highly constrained adversary can successfully carry out model poisoning while maintaining stealth, highlighting the need for effective defense mechanisms in FL.
The study considers a realistic threat model where the adversary controls a small number of malicious agents (typically one) and has no visibility into the updates provided by other agents. The adversary's goal is to ensure that the global model misclassifies a set of chosen inputs with high confidence, while also ensuring that the model converges to a point with good performance on test data. The authors demonstrate that targeted model poisoning is effective even with Byzantine-resilient aggregation mechanisms such as Krum and coordinate-wise median, which are designed to handle malicious updates.
The paper also explores the connection between model poisoning and data poisoning, showing that standard dirty-label data poisoning attacks are not effective in the FL setting. Additionally, the authors use interpretability techniques to generate visual explanations of model decisions for both benign and malicious models, finding that these explanations are nearly visually indistinguishable, highlighting the fragility of interpretability methods in the presence of model poisoning. The results indicate that model poisoning attacks can be highly effective in FL, even with the use of Byzantine-resilient aggregation mechanisms, and that the global model can be manipulated to misclassify targeted inputs without being detected. The study emphasizes the need for robust defense strategies to mitigate the risks associated with model poisoning in federated learning.This paper analyzes the vulnerability of federated learning (FL) to model poisoning attacks, where a malicious agent manipulates its local model updates to cause the global model to misclassify targeted inputs with high confidence. The authors explore various strategies for carrying out such attacks, including boosting the malicious agent's update to overcome the effects of other agents' updates, using alternating minimization to optimize for both training loss and adversarial objectives, and leveraging parameter estimation to improve attack success. They also investigate the stealthiness of these attacks by analyzing the global model's performance on validation data and the statistical properties of weight updates. The results show that even a highly constrained adversary can successfully carry out model poisoning while maintaining stealth, highlighting the need for effective defense mechanisms in FL.
The study considers a realistic threat model where the adversary controls a small number of malicious agents (typically one) and has no visibility into the updates provided by other agents. The adversary's goal is to ensure that the global model misclassifies a set of chosen inputs with high confidence, while also ensuring that the model converges to a point with good performance on test data. The authors demonstrate that targeted model poisoning is effective even with Byzantine-resilient aggregation mechanisms such as Krum and coordinate-wise median, which are designed to handle malicious updates.
The paper also explores the connection between model poisoning and data poisoning, showing that standard dirty-label data poisoning attacks are not effective in the FL setting. Additionally, the authors use interpretability techniques to generate visual explanations of model decisions for both benign and malicious models, finding that these explanations are nearly visually indistinguishable, highlighting the fragility of interpretability methods in the presence of model poisoning. The results indicate that model poisoning attacks can be highly effective in FL, even with the use of Byzantine-resilient aggregation mechanisms, and that the global model can be manipulated to misclassify targeted inputs without being detected. The study emphasizes the need for robust defense strategies to mitigate the risks associated with model poisoning in federated learning.