24 Nov 2015 | Nicolas Papernot*, Patrick McDaniel*, Somesh Jha†, Matt Fredrikson†, Z. Berkay Celik*, Ananthram Swami§
Deep learning, leveraging large datasets and efficient training algorithms, has outperformed other machine learning approaches in various tasks. However, imperfections in the training phase of deep neural networks (DNNs) make them vulnerable to adversarial samples—inputs crafted to misclassify DNN outputs. This paper formalizes the space of adversaries against DNNs and introduces a novel class of algorithms to create adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In a computer vision application, the algorithms can reliably produce samples correctly classified by humans but misclassified by a DNN with a 97% success rate while modifying only 4.02% of the input features on average. The paper also evaluates the vulnerability of different sample classes to adversarial perturbations and defines a hardness measure. Finally, preliminary work outlines defenses against adversarial samples by defining a predictive measure of the distance between benign inputs and target classifications.Deep learning, leveraging large datasets and efficient training algorithms, has outperformed other machine learning approaches in various tasks. However, imperfections in the training phase of deep neural networks (DNNs) make them vulnerable to adversarial samples—inputs crafted to misclassify DNN outputs. This paper formalizes the space of adversaries against DNNs and introduces a novel class of algorithms to create adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs. In a computer vision application, the algorithms can reliably produce samples correctly classified by humans but misclassified by a DNN with a 97% success rate while modifying only 4.02% of the input features on average. The paper also evaluates the vulnerability of different sample classes to adversarial perturbations and defines a hardness measure. Finally, preliminary work outlines defenses against adversarial samples by defining a predictive measure of the distance between benign inputs and target classifications.