14 Jan 2021 | Mikolaj Bińkowski, Danica J. Sutherland, Michael Arbel, Arthur Gretton
The paper investigates the training and performance of Generative Adversarial Networks (GANs) using the Maximum Mean Discrepancy (MMD) as the critic, termed MMD GANs. It clarifies the issue of bias in GAN loss functions, showing that gradient estimators used in MMD GANs and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. The paper discusses kernel choice for the MMD critic and characterizes the kernel corresponding to the energy distance used in the Cramér GAN critic. As an integral probability metric, MMD benefits from training strategies developed for Wasserstein GANs. Experiments show that MMD GANs can use smaller critic networks, resulting in simpler and faster training with matching performance. The paper also proposes the Kernel Inception Distance (KID) as an improved measure of GAN convergence and shows how to use it for dynamic learning rate adaptation.
The paper discusses the theoretical and practical aspects of MMD GANs, including the use of integral probability metrics, the role of witness functions, and the impact of gradient penalties. It shows that the MMD is an integral probability metric and that the gradient penalty of Gulrajani et al. applies to MMD GANs. The paper also compares MMD GANs with Wasserstein GANs and the Cramér GAN, showing that MMD GANs can achieve similar performance with smaller networks. The paper evaluates the performance of MMD GANs on various datasets, including MNIST, CIFAR-10, LSUN, and CelebA, and shows that MMD GANs with certain kernels outperform other GAN variants. The paper also discusses the evaluation metrics used to assess GAN performance, including the Inception score, FID, and KID. The results show that MMD GANs with certain kernels perform well on these metrics, and that the KID is an unbiased measure of GAN convergence. The paper concludes that MMD GANs are a promising approach for generative modeling, with the potential to achieve high-quality samples with simpler and faster training.The paper investigates the training and performance of Generative Adversarial Networks (GANs) using the Maximum Mean Discrepancy (MMD) as the critic, termed MMD GANs. It clarifies the issue of bias in GAN loss functions, showing that gradient estimators used in MMD GANs and Wasserstein GANs are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. The paper discusses kernel choice for the MMD critic and characterizes the kernel corresponding to the energy distance used in the Cramér GAN critic. As an integral probability metric, MMD benefits from training strategies developed for Wasserstein GANs. Experiments show that MMD GANs can use smaller critic networks, resulting in simpler and faster training with matching performance. The paper also proposes the Kernel Inception Distance (KID) as an improved measure of GAN convergence and shows how to use it for dynamic learning rate adaptation.
The paper discusses the theoretical and practical aspects of MMD GANs, including the use of integral probability metrics, the role of witness functions, and the impact of gradient penalties. It shows that the MMD is an integral probability metric and that the gradient penalty of Gulrajani et al. applies to MMD GANs. The paper also compares MMD GANs with Wasserstein GANs and the Cramér GAN, showing that MMD GANs can achieve similar performance with smaller networks. The paper evaluates the performance of MMD GANs on various datasets, including MNIST, CIFAR-10, LSUN, and CelebA, and shows that MMD GANs with certain kernels outperform other GAN variants. The paper also discusses the evaluation metrics used to assess GAN performance, including the Inception score, FID, and KID. The results show that MMD GANs with certain kernels perform well on these metrics, and that the KID is an unbiased measure of GAN convergence. The paper concludes that MMD GANs are a promising approach for generative modeling, with the potential to achieve high-quality samples with simpler and faster training.