2024 | Xue Jiang, Feng Liu, Zhen Fang, Hong Chen, Tongliang Liu, Feng Zheng, Bo Han
This paper proposes a novel post hoc zero-shot out-of-distribution (OOD) detection method called NegLabel, which leverages the knowledge of vision-language models (VLMs) by incorporating a large number of negative labels. The method uses negative labels, which are selected through a NegMining algorithm to ensure they have significant semantic differences from ID labels, to enhance the distinction between ID and OOD samples. The NegLabel score is designed to be positively correlated with the similarity of images to ID labels and negatively correlated with the similarity to negative labels. This approach allows the model to better distinguish between ID and OOD samples by examining their affinities with both types of labels.
The method is evaluated on various OOD detection benchmarks and demonstrates state-of-the-art performance, particularly on the ImageNet-1k benchmark. It also shows strong generalization across multiple VLM architectures and robustness against diverse domain shifts. The NegLabel method is effective in detecting OOD samples by utilizing the text comprehension capabilities of VLMs, and it outperforms existing methods in both zero-shot and fully-supervised settings. The proposed method is implemented with a grouping strategy to reduce false positives and improve the robustness of the OOD detection. Theoretical analysis supports the effectiveness of the method, and extensive experiments validate its performance across different scenarios.This paper proposes a novel post hoc zero-shot out-of-distribution (OOD) detection method called NegLabel, which leverages the knowledge of vision-language models (VLMs) by incorporating a large number of negative labels. The method uses negative labels, which are selected through a NegMining algorithm to ensure they have significant semantic differences from ID labels, to enhance the distinction between ID and OOD samples. The NegLabel score is designed to be positively correlated with the similarity of images to ID labels and negatively correlated with the similarity to negative labels. This approach allows the model to better distinguish between ID and OOD samples by examining their affinities with both types of labels.
The method is evaluated on various OOD detection benchmarks and demonstrates state-of-the-art performance, particularly on the ImageNet-1k benchmark. It also shows strong generalization across multiple VLM architectures and robustness against diverse domain shifts. The NegLabel method is effective in detecting OOD samples by utilizing the text comprehension capabilities of VLMs, and it outperforms existing methods in both zero-shot and fully-supervised settings. The proposed method is implemented with a grouping strategy to reduce false positives and improve the robustness of the OOD detection. Theoretical analysis supports the effectiveness of the method, and extensive experiments validate its performance across different scenarios.