Understanding Diffusion Models in Vision%3A A Survey

The paper provides a comprehensive survey of denoising diffusion models in computer vision, highlighting their recent emergence and significant contributions to generative modeling. Denoising diffusion models are deep generative models that consist of two stages: a forward diffusion stage where input data is gradually perturbed with Gaussian noise, and a reverse diffusion stage where the model learns to reverse this process to recover the original data. The paper identifies three generic frameworks: denoising diffusion probabilistic models (DDPMs), noise-conditioned score networks (NCSNs), and stochastic differential equations (SDEs). These frameworks are related to other deep generative models such as variational auto-encoders (VAEs), generative adversarial networks (GANs), energy-based models, autoregressive models, and normalizing flows. The survey categorizes existing diffusion models based on criteria like task, denoising condition, underlying approach, and datasets used. It discusses the strengths and limitations of diffusion models, emphasizing their high-quality and diverse generated samples despite computational burdens. The paper also explores future research directions, particularly focusing on improving the time efficiency of inference.The paper provides a comprehensive survey of denoising diffusion models in computer vision, highlighting their recent emergence and significant contributions to generative modeling. Denoising diffusion models are deep generative models that consist of two stages: a forward diffusion stage where input data is gradually perturbed with Gaussian noise, and a reverse diffusion stage where the model learns to reverse this process to recover the original data. The paper identifies three generic frameworks: denoising diffusion probabilistic models (DDPMs), noise-conditioned score networks (NCSNs), and stochastic differential equations (SDEs). These frameworks are related to other deep generative models such as variational auto-encoders (VAEs), generative adversarial networks (GANs), energy-based models, autoregressive models, and normalizing flows. The survey categorizes existing diffusion models based on criteria like task, denoising condition, underlying approach, and datasets used. It discusses the strengths and limitations of diffusion models, emphasizing their high-quality and diverse generated samples despite computational burdens. The paper also explores future research directions, particularly focusing on improving the time efficiency of inference.

Diffusion Models in Vision: A Survey

August 2022 | Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Member, IEEE, and Mubarak Shah, Fellow, IEEE