9 Sep 2019 | Dimitris Tsipras*, Shibani Santurkar*, Logan Engstrom*, Alexander Turner, Aleksander Madry
The paper explores the tension between adversarial robustness and standard accuracy in machine learning models. It demonstrates that training models to be robust against adversarial perturbations can lead to a reduction in standard accuracy, even in simple and natural settings. This trade-off is attributed to the fact that robust classifiers learn fundamentally different feature representations compared to standard classifiers. The paper also highlights unexpected benefits of adversarially robust models, such as better alignment with human perception and the ability to produce smooth inter-class interpolations similar to those obtained from generative models like GANs. The findings suggest that the interplay between adversarial robustness and standard classification is more nuanced than previously thought, emphasizing the need for further research to understand the relative costs and benefits of each approach.The paper explores the tension between adversarial robustness and standard accuracy in machine learning models. It demonstrates that training models to be robust against adversarial perturbations can lead to a reduction in standard accuracy, even in simple and natural settings. This trade-off is attributed to the fact that robust classifiers learn fundamentally different feature representations compared to standard classifiers. The paper also highlights unexpected benefits of adversarially robust models, such as better alignment with human perception and the ability to produce smooth inter-class interpolations similar to those obtained from generative models like GANs. The findings suggest that the interplay between adversarial robustness and standard classification is more nuanced than previously thought, emphasizing the need for further research to understand the relative costs and benefits of each approach.