DIVIDEMIX: LEARNING WITH NOISY LABELS AS SEMI-SUPERVISED LEARNING

DIVIDEMIX: LEARNING WITH NOISY LABELS AS SEMI-SUPERVISED LEARNING

18 Feb 2020 | Junnan Li, Richard Socher, Steven C.H. Hoi
DivideMix is a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. The method dynamically divides training data into labeled and unlabeled sets based on per-sample loss distribution modeled with a Gaussian Mixture Model (GMM). Two diverged networks are trained simultaneously, each using the dataset division from the other to avoid confirmation bias. During semi-supervised training, label co-refinement and co-guessing are employed to improve label noise handling. Experiments on multiple benchmark datasets show significant improvements over state-of-the-art methods. DivideMix outperforms existing approaches in handling symmetric and asymmetric label noise, achieving better performance on CIFAR-10, CIFAR-100, Clothing1M, and WebVision. The method is robust to label noise and effectively filters out noisy samples, using them as unlabeled data to regularize training. Ablation studies confirm the effectiveness of key components such as co-divide, label refinement, and input augmentation. DivideMix is also efficient in terms of training time, though slightly slower than Co-teaching+ but faster than some other methods. The method's ability to handle noisy labels through semi-supervised learning makes it a promising approach for improving model generalization in real-world scenarios with label noise.DivideMix is a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. The method dynamically divides training data into labeled and unlabeled sets based on per-sample loss distribution modeled with a Gaussian Mixture Model (GMM). Two diverged networks are trained simultaneously, each using the dataset division from the other to avoid confirmation bias. During semi-supervised training, label co-refinement and co-guessing are employed to improve label noise handling. Experiments on multiple benchmark datasets show significant improvements over state-of-the-art methods. DivideMix outperforms existing approaches in handling symmetric and asymmetric label noise, achieving better performance on CIFAR-10, CIFAR-100, Clothing1M, and WebVision. The method is robust to label noise and effectively filters out noisy samples, using them as unlabeled data to regularize training. Ablation studies confirm the effectiveness of key components such as co-divide, label refinement, and input augmentation. DivideMix is also efficient in terms of training time, though slightly slower than Co-teaching+ but faster than some other methods. The method's ability to handle noisy labels through semi-supervised learning makes it a promising approach for improving model generalization in real-world scenarios with label noise.
Reach us at info@study.space