Safety Verification of Deep Neural Networks

Safety Verification of Deep Neural Networks

5 May 2017 | Xiaowei Huang, Marta Kwiatkowska, Sen Wang and Min Wu
This paper presents a novel automated verification framework for ensuring the safety of classification decisions made by feed-forward deep neural networks. The framework is based on Satisfiability Modulo Theory (SMT) and is designed to verify that a neural network's classification decisions remain invariant to small perturbations (manipulations) within a specified region of the input space. The key idea is to define safety for a classification decision as the invariance of the classification within a small neighborhood of the original input. The framework enables exhaustive search of this region by employing discretisation and propagates the analysis layer by layer. The method works directly with the network code and can guarantee the discovery of adversarial examples if they exist. If found, these examples can be used to fine-tune the network or shown to human testers. The techniques are implemented using Z3 and evaluated on state-of-the-art networks, including regularised and deep learning networks. The framework is compared against existing techniques for searching adversarial examples and estimating network robustness. The paper introduces the concept of "ladders" as a way to iteratively and nondeterministically apply manipulations to explore the region around a given input. Safety is defined in terms of the invariance of the classification decision under these manipulations. The framework ensures that the analysis is exhaustive and complete under the assumption of minimality of manipulations. The method is applied to various image classification tasks, including MNIST, CIFAR10, GTSRB, and ILSVRC, demonstrating its effectiveness in verifying the safety of neural network decisions. The framework is particularly useful for applications such as autonomous driving, where the safety of classification decisions is critical.This paper presents a novel automated verification framework for ensuring the safety of classification decisions made by feed-forward deep neural networks. The framework is based on Satisfiability Modulo Theory (SMT) and is designed to verify that a neural network's classification decisions remain invariant to small perturbations (manipulations) within a specified region of the input space. The key idea is to define safety for a classification decision as the invariance of the classification within a small neighborhood of the original input. The framework enables exhaustive search of this region by employing discretisation and propagates the analysis layer by layer. The method works directly with the network code and can guarantee the discovery of adversarial examples if they exist. If found, these examples can be used to fine-tune the network or shown to human testers. The techniques are implemented using Z3 and evaluated on state-of-the-art networks, including regularised and deep learning networks. The framework is compared against existing techniques for searching adversarial examples and estimating network robustness. The paper introduces the concept of "ladders" as a way to iteratively and nondeterministically apply manipulations to explore the region around a given input. Safety is defined in terms of the invariance of the classification decision under these manipulations. The framework ensures that the analysis is exhaustive and complete under the assumption of minimality of manipulations. The method is applied to various image classification tasks, including MNIST, CIFAR10, GTSRB, and ILSVRC, demonstrating its effectiveness in verifying the safety of neural network decisions. The framework is particularly useful for applications such as autonomous driving, where the safety of classification decisions is critical.
Reach us at info@study.space