9 Sep 2019 | Dimitris Tsipras*, Shibani Santurkar*, Logan Engstrom*, Alexander Turner, Aleksander Madry
Robustness and accuracy may be at odds. This paper shows that training robust models can reduce standard accuracy and that there is a fundamental trade-off between standard accuracy and adversarial robustness. This trade-off is provable in simple settings and corroborated in complex ones. Robust classifiers learn different feature representations than standard ones, which may align better with salient data characteristics and human perception. Adversarial training, while effective, increases training time and may require more data. However, it can lead to unexpected benefits, such as feature representations that align with human perception and enable clean inter-class interpolations similar to those in generative models. The paper also shows that adversarial examples can exhibit salient data characteristics and that robust models can produce clean feature interpolations. These findings highlight the need for specialized techniques in adversarial robust learning and suggest a potential stronger connection between generative adversarial networks and adversarial robustness.Robustness and accuracy may be at odds. This paper shows that training robust models can reduce standard accuracy and that there is a fundamental trade-off between standard accuracy and adversarial robustness. This trade-off is provable in simple settings and corroborated in complex ones. Robust classifiers learn different feature representations than standard ones, which may align better with salient data characteristics and human perception. Adversarial training, while effective, increases training time and may require more data. However, it can lead to unexpected benefits, such as feature representations that align with human perception and enable clean inter-class interpolations similar to those in generative models. The paper also shows that adversarial examples can exhibit salient data characteristics and that robust models can produce clean feature interpolations. These findings highlight the need for specialized techniques in adversarial robust learning and suggest a potential stronger connection between generative adversarial networks and adversarial robustness.