26 Apr 2020 | Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel
The paper discusses the limitations of adversarial training, a technique used to enhance the robustness of machine learning models against adversarial examples. The authors show that adversarial training using single-step methods converges to a degenerate global minimum, where the model's loss function is poorly approximated by a linear function. This results in the model generating weak perturbations rather than strong defenses. The paper introduces Ensemble Adversarial Training, which augments training data with perturbations from other pre-trained models, decoupling the generation of adversarial examples from the model being trained. This approach increases the diversity of perturbations seen during training and improves robustness to black-box attacks. The authors demonstrate that their method outperforms previous approaches in both experimental evaluations and a competition on defenses against adversarial attacks. However, subsequent work has shown that more sophisticated black-box attacks can still significantly enhance the transferability of adversarial examples, reducing the accuracy of models trained with Ensemble Adversarial Training.The paper discusses the limitations of adversarial training, a technique used to enhance the robustness of machine learning models against adversarial examples. The authors show that adversarial training using single-step methods converges to a degenerate global minimum, where the model's loss function is poorly approximated by a linear function. This results in the model generating weak perturbations rather than strong defenses. The paper introduces Ensemble Adversarial Training, which augments training data with perturbations from other pre-trained models, decoupling the generation of adversarial examples from the model being trained. This approach increases the diversity of perturbations seen during training and improves robustness to black-box attacks. The authors demonstrate that their method outperforms previous approaches in both experimental evaluations and a competition on defenses against adversarial attacks. However, subsequent work has shown that more sophisticated black-box attacks can still significantly enhance the transferability of adversarial examples, reducing the accuracy of models trained with Ensemble Adversarial Training.