Certifying Some Distributional Robustness with Principled Adversarial Training

Certifying Some Distributional Robustness with Principled Adversarial Training

1 May 2020 | Aman Sinha*1 Hongseok Namkoong*2 Riccardo Volpi3 John Duchi1,4
**Summary:** This paper introduces a principled approach to adversarial training that guarantees robustness against adversarial perturbations. The method is based on distributionally robust optimization (DRO), which considers the worst-case distribution within a Wasserstein ball around the data-generating distribution. By formulating the problem as a Lagrangian relaxation, the authors derive a training procedure that incorporates worst-case perturbations of training data into model parameter updates. This approach ensures that the model is robust to adversarial examples while maintaining computational efficiency. The key insight is that for smooth losses, the robust surrogate loss function is strongly concave, allowing for efficient optimization using stochastic gradient methods. The method provides statistical guarantees for the population loss, enabling the certification of robustness for the worst-case distribution within the Wasserstein ball. The authors show that their approach matches or outperforms heuristic adversarial training methods, particularly for imperceptible perturbations. The paper also addresses the computational challenges of adversarial training by leveraging the smoothness of neural networks, which allows for efficient computation of worst-case perturbations. The authors provide theoretical guarantees for the convergence of their method and demonstrate that their approach generalizes well to test data, preventing attacks on unseen examples. In supervised learning settings, the method is adapted to consider adversarial perturbations to feature vectors only, while keeping labels unchanged. The paper also provides bounds on the Lipschitz constants of neural networks with smooth activation functions, which are essential for ensuring the effectiveness of the adversarial training procedure. Overall, the proposed method offers a principled and computationally efficient way to train robust neural networks, with theoretical guarantees and empirical validation across various adversarial attack scenarios.**Summary:** This paper introduces a principled approach to adversarial training that guarantees robustness against adversarial perturbations. The method is based on distributionally robust optimization (DRO), which considers the worst-case distribution within a Wasserstein ball around the data-generating distribution. By formulating the problem as a Lagrangian relaxation, the authors derive a training procedure that incorporates worst-case perturbations of training data into model parameter updates. This approach ensures that the model is robust to adversarial examples while maintaining computational efficiency. The key insight is that for smooth losses, the robust surrogate loss function is strongly concave, allowing for efficient optimization using stochastic gradient methods. The method provides statistical guarantees for the population loss, enabling the certification of robustness for the worst-case distribution within the Wasserstein ball. The authors show that their approach matches or outperforms heuristic adversarial training methods, particularly for imperceptible perturbations. The paper also addresses the computational challenges of adversarial training by leveraging the smoothness of neural networks, which allows for efficient computation of worst-case perturbations. The authors provide theoretical guarantees for the convergence of their method and demonstrate that their approach generalizes well to test data, preventing attacks on unseen examples. In supervised learning settings, the method is adapted to consider adversarial perturbations to feature vectors only, while keeping labels unchanged. The paper also provides bounds on the Lipschitz constants of neural networks with smooth activation functions, which are essential for ensuring the effectiveness of the adversarial training procedure. Overall, the proposed method offers a principled and computationally efficient way to train robust neural networks, with theoretical guarantees and empirical validation across various adversarial attack scenarios.
Reach us at info@study.space
[slides and audio] Certifying Some Distributional Robustness with Principled Adversarial Training