14 Mar 2016 | Nicolas Papernot*, Patrick McDaniel*, Xi Wu†, Somesh Jha†, and Ananthram Swami‡
This paper introduces defensive distillation as a defense mechanism to reduce the effectiveness of adversarial samples against deep neural networks (DNNs). Adversarial samples are inputs crafted to force DNNs to produce adversary-selected outputs, posing serious security risks. Defensive distillation is a training technique that transfers knowledge from a DNN to improve its resilience to adversarial perturbations. The paper analytically and empirically investigates the effectiveness of defensive distillation in reducing the success rate of adversarial sample crafting. It shows that defensive distillation can reduce the success rate from 95.89% to 0.45% on a DNN trained on the MNIST dataset and from 87.89% to 5.11% on a DNN trained on the CIFAR10 dataset. The paper also demonstrates that defensive distillation increases the average minimum number of features that need to be modified to create adversarial samples by about 800% on one DNN and by 556% on another. The paper argues that defensive distillation improves the generalization capabilities of DNNs and reduces their sensitivity to adversarial perturbations by smoothing the model learned during training. The paper also discusses the theoretical and practical implications of defensive distillation for DNN robustness and security.This paper introduces defensive distillation as a defense mechanism to reduce the effectiveness of adversarial samples against deep neural networks (DNNs). Adversarial samples are inputs crafted to force DNNs to produce adversary-selected outputs, posing serious security risks. Defensive distillation is a training technique that transfers knowledge from a DNN to improve its resilience to adversarial perturbations. The paper analytically and empirically investigates the effectiveness of defensive distillation in reducing the success rate of adversarial sample crafting. It shows that defensive distillation can reduce the success rate from 95.89% to 0.45% on a DNN trained on the MNIST dataset and from 87.89% to 5.11% on a DNN trained on the CIFAR10 dataset. The paper also demonstrates that defensive distillation increases the average minimum number of features that need to be modified to create adversarial samples by about 800% on one DNN and by 556% on another. The paper argues that defensive distillation improves the generalization capabilities of DNNs and reduces their sensitivity to adversarial perturbations by smoothing the model learned during training. The paper also discusses the theoretical and practical implications of defensive distillation for DNN robustness and security.