20 Mar 2015 | Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy
The paper by Goodfellow, Shlens, and Szegedy explores the phenomenon of adversarial examples in machine learning models, particularly neural networks. They argue that the primary cause of these vulnerabilities is the linear nature of neural networks rather than their nonlinearity. This explanation is supported by quantitative results and provides a simple method for generating adversarial examples. The authors demonstrate that adversarial training can reduce the test set error of a maxout network on the MNIST dataset. They also discuss the tension between designing models that are easy to train due to linearity and those that use nonlinear effects to resist adversarial perturbations. The paper includes related work, a detailed explanation of the linear explanation of adversarial examples, and discussions on different models' capacity and the generalization of adversarial examples across different models. The authors refute alternative hypotheses and conclude that adversarial examples are a result of high-dimensional dot products and the linearity of models.The paper by Goodfellow, Shlens, and Szegedy explores the phenomenon of adversarial examples in machine learning models, particularly neural networks. They argue that the primary cause of these vulnerabilities is the linear nature of neural networks rather than their nonlinearity. This explanation is supported by quantitative results and provides a simple method for generating adversarial examples. The authors demonstrate that adversarial training can reduce the test set error of a maxout network on the MNIST dataset. They also discuss the tension between designing models that are easy to train due to linearity and those that use nonlinear effects to resist adversarial perturbations. The paper includes related work, a detailed explanation of the linear explanation of adversarial examples, and discussions on different models' capacity and the generalization of adversarial examples across different models. The authors refute alternative hypotheses and conclude that adversarial examples are a result of high-dimensional dot products and the linearity of models.