mixup: BEYOND EMPIRICAL RISK MINIMIZATION

mixup: BEYOND EMPIRICAL RISK MINIMIZATION

27 Apr 2018 | Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, David Lopez-Paz
Mixup is a simple learning principle that improves the generalization of deep neural networks by training them on convex combinations of pairs of examples and their labels. This approach regularizes the network to favor simple linear behavior between training examples, reducing memorization and increasing robustness to adversarial examples. Mixup has been shown to improve performance on ImageNet-2012, CIFAR-10, CIFAR-100, Google Commands, and UCI datasets. It also enhances the robustness of neural networks when learning from corrupted labels and stabilizes the training of generative adversarial networks (GANs). Mixup is implemented with minimal computational overhead and can be applied to various tasks, including image classification, speech recognition, and tabular data. The method is data-agnostic and does not require domain-specific knowledge. Mixup is related to vicinal risk minimization and shares similarities with label smoothing in that it uses multiple smooth labels rather than single hard labels. However, unlike label smoothing, mixup establishes a linear relationship between data augmentation and the supervision signal. The experiments show that mixup significantly improves generalization and robustness, and that increasing the interpolation strength (α) leads to better generalization but higher training error on real data. Mixup also opens up possibilities for further exploration in other supervised learning tasks and unsupervised learning.Mixup is a simple learning principle that improves the generalization of deep neural networks by training them on convex combinations of pairs of examples and their labels. This approach regularizes the network to favor simple linear behavior between training examples, reducing memorization and increasing robustness to adversarial examples. Mixup has been shown to improve performance on ImageNet-2012, CIFAR-10, CIFAR-100, Google Commands, and UCI datasets. It also enhances the robustness of neural networks when learning from corrupted labels and stabilizes the training of generative adversarial networks (GANs). Mixup is implemented with minimal computational overhead and can be applied to various tasks, including image classification, speech recognition, and tabular data. The method is data-agnostic and does not require domain-specific knowledge. Mixup is related to vicinal risk minimization and shares similarities with label smoothing in that it uses multiple smooth labels rather than single hard labels. However, unlike label smoothing, mixup establishes a linear relationship between data augmentation and the supervision signal. The experiments show that mixup significantly improves generalization and robustness, and that increasing the interpolation strength (α) leads to better generalization but higher training error on real data. Mixup also opens up possibilities for further exploration in other supervised learning tasks and unsupervised learning.
Reach us at info@study.space
[slides and audio] mixup%3A Beyond Empirical Risk Minimization