Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

10 Nov 2018 | Ali Shafahi, W. Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, Tom Goldstein
This paper explores targeted clean-label poisoning attacks on neural networks, where the attacker adds correctly labeled examples to the training set to manipulate the classifier's behavior on specific test instances without degrading overall performance. The proposed attacks do not require the attacker to control the labeling function, making them difficult to detect and effective in realistic scenarios. The authors present an optimization-based method for crafting poisons and demonstrate that a single poison image can control classifier behavior in transfer learning scenarios. For end-to-end training, they introduce a "watermarking" strategy using multiple poisoned training instances to achieve reliable poisoning. The method is validated through experiments on the CIFAR dataset, showing high success rates in manipulating image classifiers. The paper also discusses the implications of these attacks and highlights the importance of data reliability and provenance.This paper explores targeted clean-label poisoning attacks on neural networks, where the attacker adds correctly labeled examples to the training set to manipulate the classifier's behavior on specific test instances without degrading overall performance. The proposed attacks do not require the attacker to control the labeling function, making them difficult to detect and effective in realistic scenarios. The authors present an optimization-based method for crafting poisons and demonstrate that a single poison image can control classifier behavior in transfer learning scenarios. For end-to-end training, they introduce a "watermarking" strategy using multiple poisoned training instances to achieve reliable poisoning. The method is validated through experiments on the CIFAR dataset, showing high success rates in manipulating image classifiers. The paper also discusses the implications of these attacks and highlights the importance of data reliability and provenance.
Reach us at info@study.space