16 Dec 2020 | Jonathan Ho, Ajay Jain, Pieter Abbeel
This paper presents diffusion probabilistic models, a class of latent variable models inspired by nonequilibrium thermodynamics, which achieve high-quality image synthesis. The authors train their models on a weighted variational bound, which connects diffusion models to denoising score matching with Langevin dynamics. Their models naturally admit a progressive lossy decompression scheme, which can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, the model achieves an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, the model achieves sample quality similar to ProgressiveGAN. The implementation is available at https://github.com/hojonathanho/diffusion.
The paper discusses diffusion models, which are latent variable models that can be trained using variational inference to produce samples matching the data after finite time. The forward process adds Gaussian noise to the data, while the reverse process reverses this process. The authors show that diffusion models can generate high-quality samples, sometimes better than other generative models. They also show that a certain parameterization of diffusion models reveals an equivalence with denoising score matching over multiple noise levels during training and with annealed Langevin dynamics during sampling. The authors obtained their best sample quality results using this parameterization, so they consider this equivalence to be one of their primary contributions.
Despite their sample quality, the authors' models do not have competitive log likelihoods compared to other likelihood-based models. They find that the majority of their models' lossless codelengths are consumed to describe imperceptible image details. They present a more refined analysis of this phenomenon in the language of lossy compression and show that the sampling procedure of diffusion models is a type of progressive decoding that resembles autoregressive decoding along a bit ordering that vastly generalizes what is normally possible with autoregressive models.
The paper also discusses the connection between diffusion models and denoising autoencoders, showing that the reverse process can be parameterized to predict either the mean or the noise. The authors find that predicting the noise performs approximately as well as predicting the mean when trained on the variational bound with fixed variances, but much better when trained with their simplified objective. They also discuss the connection between diffusion models and autoregressive models, showing that the Gaussian diffusion model can be interpreted as a kind of autoregressive model with a generalized bit ordering that cannot be expressed by reordering data coordinates.
The authors also discuss the connection between diffusion models and energy-based models, showing that the Gaussian diffusion model can be interpreted as a kind of energy-based model. They also discuss the connection between diffusion models and progressive lossy compression, showing that the diffusion model can be used for progressive lossy compression, where the bits are allocated to imperceptible distortions. The authors also discuss the connection between diffusion models and interpolation, showing thatThis paper presents diffusion probabilistic models, a class of latent variable models inspired by nonequilibrium thermodynamics, which achieve high-quality image synthesis. The authors train their models on a weighted variational bound, which connects diffusion models to denoising score matching with Langevin dynamics. Their models naturally admit a progressive lossy decompression scheme, which can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, the model achieves an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, the model achieves sample quality similar to ProgressiveGAN. The implementation is available at https://github.com/hojonathanho/diffusion.
The paper discusses diffusion models, which are latent variable models that can be trained using variational inference to produce samples matching the data after finite time. The forward process adds Gaussian noise to the data, while the reverse process reverses this process. The authors show that diffusion models can generate high-quality samples, sometimes better than other generative models. They also show that a certain parameterization of diffusion models reveals an equivalence with denoising score matching over multiple noise levels during training and with annealed Langevin dynamics during sampling. The authors obtained their best sample quality results using this parameterization, so they consider this equivalence to be one of their primary contributions.
Despite their sample quality, the authors' models do not have competitive log likelihoods compared to other likelihood-based models. They find that the majority of their models' lossless codelengths are consumed to describe imperceptible image details. They present a more refined analysis of this phenomenon in the language of lossy compression and show that the sampling procedure of diffusion models is a type of progressive decoding that resembles autoregressive decoding along a bit ordering that vastly generalizes what is normally possible with autoregressive models.
The paper also discusses the connection between diffusion models and denoising autoencoders, showing that the reverse process can be parameterized to predict either the mean or the noise. The authors find that predicting the noise performs approximately as well as predicting the mean when trained on the variational bound with fixed variances, but much better when trained with their simplified objective. They also discuss the connection between diffusion models and autoregressive models, showing that the Gaussian diffusion model can be interpreted as a kind of autoregressive model with a generalized bit ordering that cannot be expressed by reordering data coordinates.
The authors also discuss the connection between diffusion models and energy-based models, showing that the Gaussian diffusion model can be interpreted as a kind of energy-based model. They also discuss the connection between diffusion models and progressive lossy compression, showing that the diffusion model can be used for progressive lossy compression, where the bits are allocated to imperceptible distortions. The authors also discuss the connection between diffusion models and interpolation, showing that