7 Feb 2017 | Yanpei Liu*, Xinyun Chen*, Chang Liu, Dawn Song
This paper investigates the transferability of adversarial examples across different deep neural network architectures and large-scale datasets. The authors first demonstrate that adversarial examples can transfer between models, which poses a significant threat to deep learning-based applications. They conduct an extensive study of transferability across large models and a large-scale dataset, and they are the first to study the transferability of targeted adversarial examples with their target labels. The study shows that while non-targeted adversarial examples are easy to find and transfer, targeted adversarial examples generated using existing approaches rarely transfer with their target labels. To address this, the authors propose novel ensemble-based approaches to generate transferable adversarial examples. Using these approaches, they observe a large proportion of targeted adversarial examples that can transfer with their target labels for the first time. They also present geometric studies to better understand the transferable adversarial examples. Finally, they show that adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, a black-box image classification system. The authors also find that the gradient directions of different models are orthogonal to each other, and that decision boundaries of different models align well, which partially explains why adversarial examples can transfer. The study highlights the importance of understanding the geometric properties of deep neural networks to better understand and mitigate the risks posed by adversarial examples.This paper investigates the transferability of adversarial examples across different deep neural network architectures and large-scale datasets. The authors first demonstrate that adversarial examples can transfer between models, which poses a significant threat to deep learning-based applications. They conduct an extensive study of transferability across large models and a large-scale dataset, and they are the first to study the transferability of targeted adversarial examples with their target labels. The study shows that while non-targeted adversarial examples are easy to find and transfer, targeted adversarial examples generated using existing approaches rarely transfer with their target labels. To address this, the authors propose novel ensemble-based approaches to generate transferable adversarial examples. Using these approaches, they observe a large proportion of targeted adversarial examples that can transfer with their target labels for the first time. They also present geometric studies to better understand the transferable adversarial examples. Finally, they show that adversarial examples generated using ensemble-based approaches can successfully attack Clarifai.com, a black-box image classification system. The authors also find that the gradient directions of different models are orthogonal to each other, and that decision boundaries of different models align well, which partially explains why adversarial examples can transfer. The study highlights the importance of understanding the geometric properties of deep neural networks to better understand and mitigate the risks posed by adversarial examples.