How Does Unlabeled Data Provably Help Out-of-Distribution Detection?

How Does Unlabeled Data Provably Help Out-of-Distribution Detection?

2024 | Xuefeng Du, Zhen Fang, Ilias Diakonikolas, Yixuan Li
This paper introduces a novel learning framework called SAL (Separate And Learn) for out-of-distribution (OOD) detection using unlabeled in-the-wild data. The framework separates candidate outliers from unlabeled data and trains an OOD classifier using these outliers and labeled in-distribution (ID) data. Theoretical analysis shows that SAL can effectively separate outliers with small error rates, leading to a generalization guarantee for the OOD classifier. Empirically, SAL achieves state-of-the-art performance on common benchmarks, demonstrating its effectiveness. The framework is theoretically supported by rigorous error bounds and practical implications, showing that the error rates of filtering outliers can be bounded by a small value. SAL is also compared with other methods, showing significant improvements in OOD detection performance, particularly on CIFAR-100. The framework is applicable to non-convex models like modern neural networks and provides strong empirical results. Theoretical analysis also shows that the generalization error of the OOD classifier is upper bounded by the risk associated with the optimal OOD classifier. The paper concludes that SAL provides both theoretical guarantees and empirical effectiveness for OOD detection using unlabeled wild data.This paper introduces a novel learning framework called SAL (Separate And Learn) for out-of-distribution (OOD) detection using unlabeled in-the-wild data. The framework separates candidate outliers from unlabeled data and trains an OOD classifier using these outliers and labeled in-distribution (ID) data. Theoretical analysis shows that SAL can effectively separate outliers with small error rates, leading to a generalization guarantee for the OOD classifier. Empirically, SAL achieves state-of-the-art performance on common benchmarks, demonstrating its effectiveness. The framework is theoretically supported by rigorous error bounds and practical implications, showing that the error rates of filtering outliers can be bounded by a small value. SAL is also compared with other methods, showing significant improvements in OOD detection performance, particularly on CIFAR-100. The framework is applicable to non-convex models like modern neural networks and provides strong empirical results. Theoretical analysis also shows that the generalization error of the OOD classifier is upper bounded by the risk associated with the optimal OOD classifier. The paper concludes that SAL provides both theoretical guarantees and empirical effectiveness for OOD detection using unlabeled wild data.
Reach us at info@study.space