4 Sep 2019 | Aleksander Madry*, Aleksandar Makedov*, Ludwig Schmidt*, Dimitris Tsipras*, Adrian Vladu*
This paper addresses the issue of adversarial robustness in deep neural networks, which are vulnerable to inputs that are almost indistinguishable from natural data but are classified incorrectly. The authors propose a robust optimization approach to study and improve the adversarial robustness of neural networks. They formulate the problem as a saddle point optimization, which provides a principled framework to understand and address adversarial attacks. The key contributions include:
1. **Experimental Study**: They conduct an experimental study on the optimization landscape of the saddle point formulation, finding that the underlying optimization problem is tractable despite its non-convexity and non-concavity. They provide evidence that first-order methods, particularly projected gradient descent (PGD), can reliably solve this problem.
2. **Network Architecture Impact**: They explore the impact of network architecture on adversarial robustness, finding that larger network capacities are necessary to robustly withstand strong adversarial attacks compared to those required for classifying benign inputs.
3. **Robust Network Training**: They train networks on the MNIST and CIFAR10 datasets to achieve robustness against a wide range of adversarial attacks. Their best MNIST model achieves over 89% accuracy against the strongest adversaries, and their CIFAR10 model achieves 46% accuracy. These results suggest that secure neural networks are within reach.
4. **Security Guarantee**: They introduce the concept of security against a "first-order adversary," which is a broad and universal security guarantee. Training networks to be robust against such adversaries provides a strong foundation for developing fully resistant deep learning models.
The paper also discusses the related work on adversarial examples and adversarial training, highlighting the differences and improvements in their approach. The authors conclude that their findings provide evidence that deep neural networks can be made resistant to adversarial attacks, and they encourage further research to improve the robustness of these models.This paper addresses the issue of adversarial robustness in deep neural networks, which are vulnerable to inputs that are almost indistinguishable from natural data but are classified incorrectly. The authors propose a robust optimization approach to study and improve the adversarial robustness of neural networks. They formulate the problem as a saddle point optimization, which provides a principled framework to understand and address adversarial attacks. The key contributions include:
1. **Experimental Study**: They conduct an experimental study on the optimization landscape of the saddle point formulation, finding that the underlying optimization problem is tractable despite its non-convexity and non-concavity. They provide evidence that first-order methods, particularly projected gradient descent (PGD), can reliably solve this problem.
2. **Network Architecture Impact**: They explore the impact of network architecture on adversarial robustness, finding that larger network capacities are necessary to robustly withstand strong adversarial attacks compared to those required for classifying benign inputs.
3. **Robust Network Training**: They train networks on the MNIST and CIFAR10 datasets to achieve robustness against a wide range of adversarial attacks. Their best MNIST model achieves over 89% accuracy against the strongest adversaries, and their CIFAR10 model achieves 46% accuracy. These results suggest that secure neural networks are within reach.
4. **Security Guarantee**: They introduce the concept of security against a "first-order adversary," which is a broad and universal security guarantee. Training networks to be robust against such adversaries provides a strong foundation for developing fully resistant deep learning models.
The paper also discusses the related work on adversarial examples and adversarial training, highlighting the differences and improvements in their approach. The authors conclude that their findings provide evidence that deep neural networks can be made resistant to adversarial attacks, and they encourage further research to improve the robustness of these models.