TOWARDS PRINCIPLED METHODS FOR TRAINING GENERATIVE ADVERSARIAL NETWORKS

TOWARDS PRINCIPLED METHODS FOR TRAINING GENERATIVE ADVERSARIAL NETWORKS

17 Jan 2017 | Martin Arjovsky, Léon Bottou
This paper aims to understand the training dynamics of generative adversarial networks (GANs) through theoretical analysis and experiments. It addresses the instability and saturation issues that arise during GAN training. The paper is divided into three sections: an introduction, a study of the problems in GAN training, and a discussion of practical solutions. The paper begins by highlighting the challenges in training GANs, despite their success in generating realistic images. It notes that traditional generative modeling methods rely on maximizing likelihood or minimizing the Kullback-Leibler (KL) divergence between data and generated distributions. However, GANs optimize the Jensen-Shannon divergence (JSD), a symmetric measure between the two distributions. The paper then explores the theoretical reasons behind the instability in GAN training. It shows that when the supports of the data and generated distributions are disjoint or lie on low-dimensional manifolds, a perfect discriminator can be constructed, leading to unstable training. The paper proves that under these conditions, the optimal discriminator is constant almost everywhere on the support of the distributions, leading to vanishing gradients for the generator. The paper also discusses the consequences of using different cost functions in GAN training. It shows that the original cost function can lead to vanishing gradients, while an alternative cost function based on -log D can lead to unstable updates. The paper then proposes a solution by introducing continuous noise to the inputs of the discriminator, which helps to smooth the distribution of the probability mass and stabilize training. The paper concludes by introducing the Wasserstein metric as an alternative to the JSD for measuring the similarity between distributions. It shows that the Wasserstein distance can be used to evaluate generative models and provides a theoretical bound on the distance between the true and generated distributions. The paper also highlights the importance of understanding the theoretical foundations of GAN training to develop more stable and effective methods.This paper aims to understand the training dynamics of generative adversarial networks (GANs) through theoretical analysis and experiments. It addresses the instability and saturation issues that arise during GAN training. The paper is divided into three sections: an introduction, a study of the problems in GAN training, and a discussion of practical solutions. The paper begins by highlighting the challenges in training GANs, despite their success in generating realistic images. It notes that traditional generative modeling methods rely on maximizing likelihood or minimizing the Kullback-Leibler (KL) divergence between data and generated distributions. However, GANs optimize the Jensen-Shannon divergence (JSD), a symmetric measure between the two distributions. The paper then explores the theoretical reasons behind the instability in GAN training. It shows that when the supports of the data and generated distributions are disjoint or lie on low-dimensional manifolds, a perfect discriminator can be constructed, leading to unstable training. The paper proves that under these conditions, the optimal discriminator is constant almost everywhere on the support of the distributions, leading to vanishing gradients for the generator. The paper also discusses the consequences of using different cost functions in GAN training. It shows that the original cost function can lead to vanishing gradients, while an alternative cost function based on -log D can lead to unstable updates. The paper then proposes a solution by introducing continuous noise to the inputs of the discriminator, which helps to smooth the distribution of the probability mass and stabilize training. The paper concludes by introducing the Wasserstein metric as an alternative to the JSD for measuring the similarity between distributions. It shows that the Wasserstein distance can be used to evaluate generative models and provides a theoretical bound on the distance between the true and generated distributions. The paper also highlights the importance of understanding the theoretical foundations of GAN training to develop more stable and effective methods.
Reach us at info@study.space