A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks

27 Oct 2018 | Kimin Lee, Kibok Lee, Honglak Lee, Jinwoo Shin
This paper proposes a simple yet effective method for detecting abnormal test samples, including out-of-distribution (OOD) and adversarial samples, applicable to any pre-trained softmax neural classifier without retraining. The method utilizes the concept of a generative classifier under Gaussian discriminant analysis (GDA) to estimate class-conditional Gaussian distributions based on low- and high-level features of deep models. The confidence score is derived using the Mahalanobis distance between test samples and the closest class-conditional distribution. This approach outperforms existing methods in detecting both OOD and adversarial samples, and is more robust in scenarios with noisy labels or limited data. The method is also applicable to class-incremental learning, where new classes can be incorporated without retraining the deep model. The proposed method is evaluated on various datasets, including CIFAR, SVHN, ImageNet, and LSUN, and demonstrates superior performance in detecting OOD samples and adversarial attacks. The method is also robust to extreme scenarios, such as small training data or noisy labels, and can be applied to other machine learning tasks like active learning, ensemble learning, and few-shot learning.This paper proposes a simple yet effective method for detecting abnormal test samples, including out-of-distribution (OOD) and adversarial samples, applicable to any pre-trained softmax neural classifier without retraining. The method utilizes the concept of a generative classifier under Gaussian discriminant analysis (GDA) to estimate class-conditional Gaussian distributions based on low- and high-level features of deep models. The confidence score is derived using the Mahalanobis distance between test samples and the closest class-conditional distribution. This approach outperforms existing methods in detecting both OOD and adversarial samples, and is more robust in scenarios with noisy labels or limited data. The method is also applicable to class-incremental learning, where new classes can be incorporated without retraining the deep model. The proposed method is evaluated on various datasets, including CIFAR, SVHN, ImageNet, and LSUN, and demonstrates superior performance in detecting OOD samples and adversarial attacks. The method is also robust to extreme scenarios, such as small training data or noisy labels, and can be applied to other machine learning tasks like active learning, ensemble learning, and few-shot learning.
Reach us at info@study.space