[slides and audio] Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks

Deep learning algorithms, while performing well on various machine learning problems, are vulnerable to adversarial samples—inputs crafted to force deep neural networks (DNNs) to provide adversary-selected outputs. These attacks can have severe consequences, such as crashing autonomous vehicles or bypassing content filters. To address this issue, the authors introduce *defensive distillation*, a mechanism that enhances the robustness of DNNs against adversarial perturbations. They analytically investigate the generalizability and robustness properties of defensive distillation and empirically study its effectiveness on two DNNs in adversarial settings. The results show that defensive distillation can reduce the effectiveness of adversarial sample creation from 95% to less than 0.5%, and increase the average minimum number of features required to create adversarial samples by about 800%. The mechanism works by using the knowledge extracted during distillation to smooth the model's gradients, making it less sensitive to input perturbations. This approach does not require modifications to the DNN architecture and has minimal overhead during training and testing.Deep learning algorithms, while performing well on various machine learning problems, are vulnerable to adversarial samples—inputs crafted to force deep neural networks (DNNs) to provide adversary-selected outputs. These attacks can have severe consequences, such as crashing autonomous vehicles or bypassing content filters. To address this issue, the authors introduce *defensive distillation*, a mechanism that enhances the robustness of DNNs against adversarial perturbations. They analytically investigate the generalizability and robustness properties of defensive distillation and empirically study its effectiveness on two DNNs in adversarial settings. The results show that defensive distillation can reduce the effectiveness of adversarial sample creation from 95% to less than 0.5%, and increase the average minimum number of features required to create adversarial samples by about 800%. The mechanism works by using the knowledge extracted during distillation to smooth the model's gradients, making it less sensitive to input perturbations. This approach does not require modifications to the DNN architecture and has minimal overhead during training and testing.

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

14 Mar 2016 | Nicolas Papernot, Patrick McDaniel, Xi Wu†, Somesh Jha†, and Ananthram Swami‡

Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks

14 Mar 2016 | Nicolas Papernot*, Patrick McDaniel*, Xi Wu†, Somesh Jha†, and Ananthram Swami‡

14 Mar 2016 | Nicolas Papernot, Patrick McDaniel, Xi Wu†, Somesh Jha†, and Ananthram Swami‡