ENSEMBLE ADVERSARIAL TRAINING: ATTACKS AND DEFENSES

ENSEMBLE ADVERSARIAL TRAINING: ATTACKS AND DEFENSES

26 Apr 2020 | Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel
Adversarial examples are inputs designed to fool machine learning models. Adversarial training improves robustness by incorporating such examples into training data. However, using fast single-step methods to generate perturbations leads to a degenerate global minimum, where the model learns to generate weak perturbations rather than defend against strong ones. This makes adversarial training vulnerable to black-box attacks and a new single-step attack that escapes the non-smooth vicinity of data points. Ensemble Adversarial Training, which augments training data with perturbations from other models, improves robustness to black-box attacks. On ImageNet, this method yields models with stronger robustness, with one model winning the first round of the NIPS 2017 competition on adversarial defense. However, more sophisticated black-box attacks can transfer effectively, reducing model accuracy. Adversarial training with single-step methods results in models that are less robust to attacks, as they degrade the linear approximation of the loss function. This phenomenon, known as gradient masking, affects other defensive techniques. Ensemble Adversarial Training decouples adversarial example generation from the model being trained, increasing perturbation diversity and improving robustness to black-box attacks. Experiments show that adversarial training with single-step methods leads to models that are more vulnerable to attacks than expected. Ensemble Adversarial Training, which uses perturbations from other models, improves robustness to black-box attacks. However, subsequent work has shown that more advanced attacks can bypass these defenses, reducing robustness. The paper also introduces a new single-step attack, R+FGSM, which outperforms existing methods in certain scenarios. Ensemble Adversarial Training is evaluated on ImageNet, showing improved robustness to black-box attacks. However, the effectiveness of these methods is limited by the complexity of the attacks and the scale of the task. The paper concludes that adversarial training can be improved by decoupling the generation of adversarial examples from the model being trained, and that Ensemble Adversarial Training provides a promising approach for enhancing robustness to adversarial attacks.Adversarial examples are inputs designed to fool machine learning models. Adversarial training improves robustness by incorporating such examples into training data. However, using fast single-step methods to generate perturbations leads to a degenerate global minimum, where the model learns to generate weak perturbations rather than defend against strong ones. This makes adversarial training vulnerable to black-box attacks and a new single-step attack that escapes the non-smooth vicinity of data points. Ensemble Adversarial Training, which augments training data with perturbations from other models, improves robustness to black-box attacks. On ImageNet, this method yields models with stronger robustness, with one model winning the first round of the NIPS 2017 competition on adversarial defense. However, more sophisticated black-box attacks can transfer effectively, reducing model accuracy. Adversarial training with single-step methods results in models that are less robust to attacks, as they degrade the linear approximation of the loss function. This phenomenon, known as gradient masking, affects other defensive techniques. Ensemble Adversarial Training decouples adversarial example generation from the model being trained, increasing perturbation diversity and improving robustness to black-box attacks. Experiments show that adversarial training with single-step methods leads to models that are more vulnerable to attacks than expected. Ensemble Adversarial Training, which uses perturbations from other models, improves robustness to black-box attacks. However, subsequent work has shown that more advanced attacks can bypass these defenses, reducing robustness. The paper also introduces a new single-step attack, R+FGSM, which outperforms existing methods in certain scenarios. Ensemble Adversarial Training is evaluated on ImageNet, showing improved robustness to black-box attacks. However, the effectiveness of these methods is limited by the complexity of the attacks and the scale of the task. The paper concludes that adversarial training can be improved by decoupling the generation of adversarial examples from the model being trained, and that Ensemble Adversarial Training provides a promising approach for enhancing robustness to adversarial attacks.
Reach us at info@study.space
[slides and audio] Ensemble Adversarial Training%3A Attacks and Defenses