31 Oct 2020 | Aditi Raghunathan, Jacob Steinhardt & Percy Liang
This paper presents a method for certifying the robustness of neural networks against adversarial examples. The authors propose a technique that computes an upper bound on the worst-case error of a neural network under adversarial perturbations. This upper bound is derived using a semidefinite relaxation, which allows for efficient computation and provides a certificate of robustness. The method is applied to two-layer neural networks and is shown to produce a network and a certificate that no adversarial attack with perturbations of size ε = 0.1 can cause more than 35% test error on the MNIST dataset.
The paper discusses the challenges of defending against adversarial examples, noting that existing defenses often fail against new, stronger attacks. The proposed method addresses this by jointly optimizing the network and the certificate, which acts as an adaptive regularizer that encourages robustness against all adversarial attacks. The method is evaluated on the MNIST dataset, where it outperforms other methods in terms of the tightness of the upper bound on adversarial error.
The authors also compare their method with other approaches, including those based on linear programming and spectral norms. They show that their method provides tighter bounds and is more effective in certifying robustness. The paper concludes that their approach is the first to demonstrate a certifiable, trainable, and scalable method for defending against adversarial examples on two-layer networks. The results show that their method produces a network with a 4.2% test error on clean data and a certificate that no adversarial attack can misclassify more than 35% of the test examples using ε = 0.1 perturbations.This paper presents a method for certifying the robustness of neural networks against adversarial examples. The authors propose a technique that computes an upper bound on the worst-case error of a neural network under adversarial perturbations. This upper bound is derived using a semidefinite relaxation, which allows for efficient computation and provides a certificate of robustness. The method is applied to two-layer neural networks and is shown to produce a network and a certificate that no adversarial attack with perturbations of size ε = 0.1 can cause more than 35% test error on the MNIST dataset.
The paper discusses the challenges of defending against adversarial examples, noting that existing defenses often fail against new, stronger attacks. The proposed method addresses this by jointly optimizing the network and the certificate, which acts as an adaptive regularizer that encourages robustness against all adversarial attacks. The method is evaluated on the MNIST dataset, where it outperforms other methods in terms of the tightness of the upper bound on adversarial error.
The authors also compare their method with other approaches, including those based on linear programming and spectral norms. They show that their method provides tighter bounds and is more effective in certifying robustness. The paper concludes that their approach is the first to demonstrate a certifiable, trainable, and scalable method for defending against adversarial examples on two-layer networks. The results show that their method produces a network with a 4.2% test error on clean data and a certificate that no adversarial attack can misclassify more than 35% of the test examples using ε = 0.1 perturbations.