August 16, 2016, with very minor revisions on January 3, 2021 | CARL DOERSCH
This tutorial introduces Variational Autoencoders (VAEs), a popular approach for unsupervised learning of complex distributions. VAEs are appealing because they are built on top of standard function approximators (neural networks) and can be trained with stochastic gradient descent. They have shown promise in generating various types of data, including handwritten digits, faces, house numbers, CIFAR images, and physical models of scenes. The tutorial explains the intuition behind VAEs, the mathematics, and empirical behavior. It assumes no prior knowledge of variational Bayesian methods.
VAEs approximate the maximum likelihood objective by using a latent variable model. The latent variables are sampled from a simple distribution, such as a Gaussian, and then used to generate data. The model learns to map these latent variables to data through a deterministic function. The key challenge is to compute the likelihood of data under the generative model, which is intractable directly. Instead, VAEs use a variational approach to approximate this likelihood by minimizing the Kullback-Leibler divergence between the approximate posterior and the true posterior.
The objective function for VAEs is derived from the maximum likelihood objective and involves minimizing the KL divergence between the approximate posterior and the true posterior. This is done by optimizing the evidence lower bound (ELBO), which consists of two terms: the reconstruction loss and the KL divergence term. The reconstruction loss measures how well the model can reconstruct the input data, while the KL divergence term ensures that the latent variables are sampled from a simple distribution.
The tutorial also discusses the reparameterization trick, which allows for efficient gradient computation in VAEs by transforming the sampling process into a differentiable operation. This trick enables the use of stochastic gradient descent to optimize the ELBO. The tutorial further explores the interpretation of the objective function in terms of information theory and discusses the regularization aspects of VAEs.
Finally, the tutorial presents examples of VAEs applied to the MNIST dataset, demonstrating their ability to generate realistic images from random noise. It also introduces conditional VAEs, which can generate data conditioned on additional information, such as partial inputs. The tutorial concludes with a proof showing that VAEs can achieve zero approximation error given sufficiently powerful learners.This tutorial introduces Variational Autoencoders (VAEs), a popular approach for unsupervised learning of complex distributions. VAEs are appealing because they are built on top of standard function approximators (neural networks) and can be trained with stochastic gradient descent. They have shown promise in generating various types of data, including handwritten digits, faces, house numbers, CIFAR images, and physical models of scenes. The tutorial explains the intuition behind VAEs, the mathematics, and empirical behavior. It assumes no prior knowledge of variational Bayesian methods.
VAEs approximate the maximum likelihood objective by using a latent variable model. The latent variables are sampled from a simple distribution, such as a Gaussian, and then used to generate data. The model learns to map these latent variables to data through a deterministic function. The key challenge is to compute the likelihood of data under the generative model, which is intractable directly. Instead, VAEs use a variational approach to approximate this likelihood by minimizing the Kullback-Leibler divergence between the approximate posterior and the true posterior.
The objective function for VAEs is derived from the maximum likelihood objective and involves minimizing the KL divergence between the approximate posterior and the true posterior. This is done by optimizing the evidence lower bound (ELBO), which consists of two terms: the reconstruction loss and the KL divergence term. The reconstruction loss measures how well the model can reconstruct the input data, while the KL divergence term ensures that the latent variables are sampled from a simple distribution.
The tutorial also discusses the reparameterization trick, which allows for efficient gradient computation in VAEs by transforming the sampling process into a differentiable operation. This trick enables the use of stochastic gradient descent to optimize the ELBO. The tutorial further explores the interpretation of the objective function in terms of information theory and discusses the regularization aspects of VAEs.
Finally, the tutorial presents examples of VAEs applied to the MNIST dataset, demonstrating their ability to generate realistic images from random noise. It also introduces conditional VAEs, which can generate data conditioned on additional information, such as partial inputs. The tutorial concludes with a proof showing that VAEs can achieve zero approximation error given sufficiently powerful learners.