[slides] Unsupervised Data Augmentation for Consistency Training

This paper introduces Unsupervised Data Augmentation (UDA), a novel approach to semi-supervised learning that leverages advanced data augmentation methods to improve the performance of models trained on limited labeled data. The authors argue that the quality of noise, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, UDA achieves significant improvements across various language and vision tasks under the same consistency training framework. On the IMDb text classification dataset, UDA with only 20 labeled examples outperforms state-of-the-art models trained on 12,500 labeled examples. On standard semi-supervised learning benchmarks like CIFAR-10 and SVHN, UDA outperforms all existing methods and achieves error rates of 5.43 and 2.72 with 250 labeled examples, respectively. UDA also combines well with transfer learning, such as fine-tuning from BERT, and shows improvements in high-data regimes, such as ImageNet, whether with 10% labeled data or a full labeled set plus an external dataset with 1.3M unlabeled examples. The key contributions of the paper include demonstrating that state-of-the-art data augmentations from supervised learning can serve as superior sources of noise in semi-supervised learning, showing that UDA can match or outperform purely supervised learning with much less labeled data, and highlighting the effectiveness of UDA in combination with transfer learning and in high-data regimes. The paper also provides a theoretical analysis of how UDA improves classification performance and discusses the role of advanced data augmentation methods in this process.This paper introduces Unsupervised Data Augmentation (UDA), a novel approach to semi-supervised learning that leverages advanced data augmentation methods to improve the performance of models trained on limited labeled data. The authors argue that the quality of noise, specifically those produced by advanced data augmentation methods, plays a crucial role in semi-supervised learning. By substituting simple noising operations with advanced data augmentation methods such as RandAugment and back-translation, UDA achieves significant improvements across various language and vision tasks under the same consistency training framework. On the IMDb text classification dataset, UDA with only 20 labeled examples outperforms state-of-the-art models trained on 12,500 labeled examples. On standard semi-supervised learning benchmarks like CIFAR-10 and SVHN, UDA outperforms all existing methods and achieves error rates of 5.43 and 2.72 with 250 labeled examples, respectively. UDA also combines well with transfer learning, such as fine-tuning from BERT, and shows improvements in high-data regimes, such as ImageNet, whether with 10% labeled data or a full labeled set plus an external dataset with 1.3M unlabeled examples. The key contributions of the paper include demonstrating that state-of-the-art data augmentations from supervised learning can serve as superior sources of noise in semi-supervised learning, showing that UDA can match or outperform purely supervised learning with much less labeled data, and highlighting the effectiveness of UDA in combination with transfer learning and in high-data regimes. The paper also provides a theoretical analysis of how UDA improves classification performance and discusses the role of advanced data augmentation methods in this process.

Unsupervised Data Augmentation for Consistency Training

5 Nov 2020 | Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le