5 Nov 2020 | Qizhe Xie, Zihang Dai, Eduard Hovy, Minh-Thang Luong, Quoc V. Le
This paper introduces Unsupervised Data Augmentation (UDA), a method that improves semi-supervised learning by using advanced data augmentation techniques instead of traditional noise injection. The key idea is that high-quality data augmentation methods, which are effective in supervised learning, can also enhance semi-supervised learning by generating diverse and realistic examples. UDA is evaluated on various language and vision tasks, showing significant improvements over existing methods. On the IMDb text classification dataset with only 20 labeled examples, UDA achieves an error rate of 4.20, outperforming a state-of-the-art model trained on 25,000 examples. On the CIFAR-10 benchmark, UDA achieves an error rate of 5.43 with only 250 examples, outperforming all previous approaches. UDA also performs well when combined with transfer learning, such as fine-tuning from BERT, and is effective in high-data regimes like ImageNet. Theoretical analysis shows that UDA can achieve better performance with fewer labeled examples by reducing the number of components in the data graph through more diverse and realistic data augmentation. Experiments on various tasks demonstrate that UDA consistently outperforms existing semi-supervised learning methods, showing its effectiveness in both language and vision domains.This paper introduces Unsupervised Data Augmentation (UDA), a method that improves semi-supervised learning by using advanced data augmentation techniques instead of traditional noise injection. The key idea is that high-quality data augmentation methods, which are effective in supervised learning, can also enhance semi-supervised learning by generating diverse and realistic examples. UDA is evaluated on various language and vision tasks, showing significant improvements over existing methods. On the IMDb text classification dataset with only 20 labeled examples, UDA achieves an error rate of 4.20, outperforming a state-of-the-art model trained on 25,000 examples. On the CIFAR-10 benchmark, UDA achieves an error rate of 5.43 with only 250 examples, outperforming all previous approaches. UDA also performs well when combined with transfer learning, such as fine-tuning from BERT, and is effective in high-data regimes like ImageNet. Theoretical analysis shows that UDA can achieve better performance with fewer labeled examples by reducing the number of components in the data graph through more diverse and realistic data augmentation. Experiments on various tasks demonstrate that UDA consistently outperforms existing semi-supervised learning methods, showing its effectiveness in both language and vision domains.