10 Nov 2018 | Ali Shafahi, W. Ronny Huang, Mahyar Najibi, Octavian Suciu, Christoph Studer, Tudor Dumitras, Tom Goldstein
This paper introduces clean-label poisoning attacks on neural networks, which manipulate classifier behavior without requiring the attacker to control the labeling of training data. These attacks are targeted, aiming to control the behavior of a classifier on a specific test instance without degrading overall performance. The proposed method uses an optimization-based approach to generate poison instances that collide with a target instance in feature space, making it difficult for the classifier to distinguish between them. The attack is effective even when the attacker has no control over the labeling process, as the poison instances can be added to the training data by leaving them on the web and waiting for them to be scraped by data collection bots.
The paper demonstrates the effectiveness of these attacks in both transfer learning and end-to-end training scenarios. In transfer learning, a single poison instance can cause the classifier to misclassify a target instance with 100% success rate. In end-to-end training, multiple poison instances are required to achieve successful attacks, and a "watermarking" technique is used to ensure the poison instances remain visually indistinguishable from the base instances. The watermarking technique involves adding a low-opacity watermark of the target instance to the poison instances, allowing for some feature overlap while remaining visually distinct.
The paper also shows that clean-label attacks are more effective in scenarios where the target instance is an outlier, as these instances are more easily manipulated. The success rate of the attacks increases with the number of poison instances used, and the paper demonstrates that using 50 poison instances can achieve a 60% success rate in the dog-vs-fish classification task. The paper concludes that clean-label poisoning attacks are a significant threat to the security of neural networks, and that further research is needed to develop effective defenses against these attacks.This paper introduces clean-label poisoning attacks on neural networks, which manipulate classifier behavior without requiring the attacker to control the labeling of training data. These attacks are targeted, aiming to control the behavior of a classifier on a specific test instance without degrading overall performance. The proposed method uses an optimization-based approach to generate poison instances that collide with a target instance in feature space, making it difficult for the classifier to distinguish between them. The attack is effective even when the attacker has no control over the labeling process, as the poison instances can be added to the training data by leaving them on the web and waiting for them to be scraped by data collection bots.
The paper demonstrates the effectiveness of these attacks in both transfer learning and end-to-end training scenarios. In transfer learning, a single poison instance can cause the classifier to misclassify a target instance with 100% success rate. In end-to-end training, multiple poison instances are required to achieve successful attacks, and a "watermarking" technique is used to ensure the poison instances remain visually indistinguishable from the base instances. The watermarking technique involves adding a low-opacity watermark of the target instance to the poison instances, allowing for some feature overlap while remaining visually distinct.
The paper also shows that clean-label attacks are more effective in scenarios where the target instance is an outlier, as these instances are more easily manipulated. The success rate of the attacks increases with the number of poison instances used, and the paper demonstrates that using 50 poison instances can achieve a 60% success rate in the dog-vs-fish classification task. The paper concludes that clean-label poisoning attacks are a significant threat to the security of neural networks, and that further research is needed to develop effective defenses against these attacks.