Understanding DivideMix%3A Learning with Noisy Labels as Semi-supervised Learning

This paper introduces DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. DivideMix dynamically divides the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, the framework simultaneously trains two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, the method improves the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. The key contributions of this work include the co-divide approach, which trains two networks simultaneously, and the improvements to MixMatch with label co-refinement and co-guessing. The experiments show that DivideMix significantly advances state-of-the-art results on multiple benchmarks with different types and levels of label noise. The paper also provides extensive ablation studies and qualitative results to examine the effect of different components.This paper introduces DivideMix, a novel framework for learning with noisy labels by leveraging semi-supervised learning techniques. DivideMix dynamically divides the training data into a labeled set with clean samples and an unlabeled set with noisy samples, and trains the model on both the labeled and unlabeled data in a semi-supervised manner. To avoid confirmation bias, the framework simultaneously trains two diverged networks where each network uses the dataset division from the other network. During the semi-supervised training phase, the method improves the MixMatch strategy by performing label co-refinement and label co-guessing on labeled and unlabeled samples, respectively. Experiments on multiple benchmark datasets demonstrate substantial improvements over state-of-the-art methods. The key contributions of this work include the co-divide approach, which trains two networks simultaneously, and the improvements to MixMatch with label co-refinement and co-guessing. The experiments show that DivideMix significantly advances state-of-the-art results on multiple benchmarks with different types and levels of label noise. The paper also provides extensive ablation studies and qualitative results to examine the effect of different components.

DIVIDEMIX: LEARNING WITH NOISY LABELS AS SEMI-SUPERVISED LEARNING

18 Feb 2020 | Junnan Li, Richard Socher, Steven C.H. Hoi