Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

20 May 2024 | Yang Yang, Member, IEEE, Nan Jiang, Yi Xu, and De-Chuan Zhan
WiseOpen is a robust open-set semi-supervised learning (OSSL) framework that selectively leverages open-set data to enhance the model's ability to classify in-distribution (ID) data. Traditional OSSL methods often use all open-set data, which may include unfriendly data that negatively affects performance. WiseOpen addresses this by employing a gradient-variance-based selection mechanism (GV-SM) to identify and use only the friendly open-set data. This approach improves the model's ID classification capability by focusing on data that contributes positively to the learning process. Additionally, two practical variants of WiseOpen, WiseOpen-E and WiseOpen-L, are proposed to reduce computational costs. WiseOpen-E uses a lower update frequency for the selection of open-set data, while WiseOpen-L employs loss-based selection instead of gradient variance. These variants maintain the effectiveness of WiseOpen while being more computationally efficient. Extensive experiments on benchmark datasets such as CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that WiseOpen and its variants outperform existing methods in ID classification and OOD detection. Theoretical analysis supports the effectiveness of the GV-SM and LSM mechanisms, showing that selecting friendly open-set data leads to better generalization and performance. The results indicate that WiseOpen provides a robust and efficient solution for OSSL by carefully leveraging open-set data.WiseOpen is a robust open-set semi-supervised learning (OSSL) framework that selectively leverages open-set data to enhance the model's ability to classify in-distribution (ID) data. Traditional OSSL methods often use all open-set data, which may include unfriendly data that negatively affects performance. WiseOpen addresses this by employing a gradient-variance-based selection mechanism (GV-SM) to identify and use only the friendly open-set data. This approach improves the model's ID classification capability by focusing on data that contributes positively to the learning process. Additionally, two practical variants of WiseOpen, WiseOpen-E and WiseOpen-L, are proposed to reduce computational costs. WiseOpen-E uses a lower update frequency for the selection of open-set data, while WiseOpen-L employs loss-based selection instead of gradient variance. These variants maintain the effectiveness of WiseOpen while being more computationally efficient. Extensive experiments on benchmark datasets such as CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that WiseOpen and its variants outperform existing methods in ID classification and OOD detection. Theoretical analysis supports the effectiveness of the GV-SM and LSM mechanisms, showing that selecting friendly open-set data leads to better generalization and performance. The results indicate that WiseOpen provides a robust and efficient solution for OSSL by carefully leveraging open-set data.
Reach us at info@study.space