Understanding Robust Semi-Supervised Learning by Wisely Leveraging Open-Set Data

The paper addresses the challenge of Open-Set Semi-Supervised Learning (OSSL), where unlabeled data may include out-of-distribution (OOD) samples from unseen classes, potentially degrading model performance. Traditional OSSL approaches often use the entire set of open-set data during training, which can include unfriendly data that negatively impacts performance. To tackle this issue, the authors propose Wise Open-set Semi-supervised Learning (WiseOpen), a framework that selectively leverages open-set data based on gradient variance. This approach aims to enhance the model's capability in classifying in-distribution (ID) samples by exploiting a subset of friendly open-set data. The paper also introduces two practical variants, WiseOpen-E and WiseOpen-L, which reduce computational costs while maintaining performance improvements. The effectiveness of WiseOpen and its variants is demonstrated through extensive experiments on benchmark datasets, showing superior performance in ID classification compared to state-of-the-art methods. The theoretical analysis and empirical findings support the necessity of carefully selecting and leveraging open-set data to improve OSSL performance.The paper addresses the challenge of Open-Set Semi-Supervised Learning (OSSL), where unlabeled data may include out-of-distribution (OOD) samples from unseen classes, potentially degrading model performance. Traditional OSSL approaches often use the entire set of open-set data during training, which can include unfriendly data that negatively impacts performance. To tackle this issue, the authors propose Wise Open-set Semi-supervised Learning (WiseOpen), a framework that selectively leverages open-set data based on gradient variance. This approach aims to enhance the model's capability in classifying in-distribution (ID) samples by exploiting a subset of friendly open-set data. The paper also introduces two practical variants, WiseOpen-E and WiseOpen-L, which reduce computational costs while maintaining performance improvements. The effectiveness of WiseOpen and its variants is demonstrated through extensive experiments on benchmark datasets, showing superior performance in ID classification compared to state-of-the-art methods. The theoretical analysis and empirical findings support the necessity of carefully selecting and leveraging open-set data to improve OSSL performance.

Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

20 May 2024 | Yang Yang, Member, IEEE, Nan Jiang, Yi Xu, and De-Chuan Zhan