25 Nov 2019 | Arjun Nitin Bhagoji*, Supriyo Chakraborty, Prateek Mittal, and Seraphin Calo
This paper explores the threat of *model poisoning* attacks in federated learning, where a single malicious agent aims to cause the globally trained model to misclassify a set of chosen inputs with high confidence. The authors investigate various strategies to execute such attacks, including *explicit boosting* to overcome the effects of other agents' updates and *stealthy model poisoning* to maintain stealthiness. They propose an *alternating minimization* strategy that alternates between optimizing the training loss and the adversarial objective, enhancing the attack's success rate. The paper also evaluates the effectiveness of these attacks on two datasets (Fashion-MNIST and Adult Census) and compares them with Byzantine-resilient aggregation mechanisms like Krum and coordinate-wise median. The results show that even a highly constrained adversary can successfully carry out model poisoning while maintaining stealth, highlighting the need for effective defense strategies. Additionally, the paper discusses the differences between model poisoning and data poisoning, demonstrating that model poisoning is more effective in federated learning due to the lack of data sharing. Finally, the authors use interpretability techniques to show that the explanations generated for both benign and malicious models are nearly indistinguishable, exposing the fragility of these techniques.This paper explores the threat of *model poisoning* attacks in federated learning, where a single malicious agent aims to cause the globally trained model to misclassify a set of chosen inputs with high confidence. The authors investigate various strategies to execute such attacks, including *explicit boosting* to overcome the effects of other agents' updates and *stealthy model poisoning* to maintain stealthiness. They propose an *alternating minimization* strategy that alternates between optimizing the training loss and the adversarial objective, enhancing the attack's success rate. The paper also evaluates the effectiveness of these attacks on two datasets (Fashion-MNIST and Adult Census) and compares them with Byzantine-resilient aggregation mechanisms like Krum and coordinate-wise median. The results show that even a highly constrained adversary can successfully carry out model poisoning while maintaining stealth, highlighting the need for effective defense strategies. Additionally, the paper discusses the differences between model poisoning and data poisoning, demonstrating that model poisoning is more effective in federated learning due to the lack of data sharing. Finally, the authors use interpretability techniques to show that the explanations generated for both benign and malicious models are nearly indistinguishable, exposing the fragility of these techniques.