CERTIFIED DEFENSES AGAINST ADVERSARIAL EXAMPLES

CERTIFIED DEFENSES AGAINST ADVERSARIAL EXAMPLES

31 Oct 2020 | Aditi Raghunathan, Jacob Steinhardt & Percy Liang
This paper addresses the issue of neural networks' vulnerability to adversarial perturbations, which can significantly degrade their accuracy. The authors propose a method to generate *certificates of robustness* for neural networks with one hidden layer. These certificates are based on a semidefinite relaxation, which outputs an upper bound on the adversarial loss for a given network and input. This upper bound serves as a certificate of robustness, ensuring that no attack can cause more than a certain error rate. The certificate is differentiable, allowing it to be jointly optimized with the network parameters, providing an *adaptive regularizer* that encourages robustness against all attacks. On the MNIST dataset, the approach produces a network and a certificate that guarantees no attack can cause more than 35% test error when perturbing each pixel by at most 0.1. The paper also discusses the limitations of existing defenses and compares the proposed method with other approaches, demonstrating its effectiveness and scalability.This paper addresses the issue of neural networks' vulnerability to adversarial perturbations, which can significantly degrade their accuracy. The authors propose a method to generate *certificates of robustness* for neural networks with one hidden layer. These certificates are based on a semidefinite relaxation, which outputs an upper bound on the adversarial loss for a given network and input. This upper bound serves as a certificate of robustness, ensuring that no attack can cause more than a certain error rate. The certificate is differentiable, allowing it to be jointly optimized with the network parameters, providing an *adaptive regularizer* that encourages robustness against all attacks. On the MNIST dataset, the approach produces a network and a certificate that guarantees no attack can cause more than 35% test error when perturbing each pixel by at most 0.1. The paper also discusses the limitations of existing defenses and compares the proposed method with other approaches, demonstrating its effectiveness and scalability.
Reach us at info@study.space
Understanding Certified Defenses against Adversarial Examples