AUGUST 2022 | Florinel-Alin Croitoru, Vlad Hondru, Radu Tudor Ionescu, Member, IEEE, and Mubarak Shah, Fellow, IEEE
This survey provides a comprehensive review of denoising diffusion models (DDMs) in computer vision, covering both theoretical and practical contributions. DDMs are deep generative models that operate in two stages: a forward diffusion stage, where input data is gradually corrupted by adding Gaussian noise, and a reverse diffusion stage, where the model learns to reverse the process step by step to reconstruct the original data. Despite their computational demands, DDMs are praised for generating high-quality and diverse samples. The paper identifies three main frameworks for DDMs: denoising diffusion probabilistic models (DDPMs), noise conditioned score networks (NCSNs), and stochastic differential equations (SDEs). It also discusses the relationships between DDMs and other deep generative models, such as variational autoencoders (VAEs), generative adversarial networks (GANs), and normalizing flows. The paper categorizes DDMs from multiple perspectives, including the task they are applied to, the input signals they require, and the underlying framework. It highlights the current limitations of DDMs, such as slow inference times, and suggests future research directions. The survey also presents various applications of DDMs in image generation, super-resolution, inpainting, image editing, and segmentation. The paper concludes with a detailed discussion of the key components of DDMs, including the forward and reverse processes, the training objectives, and the sampling methods.This survey provides a comprehensive review of denoising diffusion models (DDMs) in computer vision, covering both theoretical and practical contributions. DDMs are deep generative models that operate in two stages: a forward diffusion stage, where input data is gradually corrupted by adding Gaussian noise, and a reverse diffusion stage, where the model learns to reverse the process step by step to reconstruct the original data. Despite their computational demands, DDMs are praised for generating high-quality and diverse samples. The paper identifies three main frameworks for DDMs: denoising diffusion probabilistic models (DDPMs), noise conditioned score networks (NCSNs), and stochastic differential equations (SDEs). It also discusses the relationships between DDMs and other deep generative models, such as variational autoencoders (VAEs), generative adversarial networks (GANs), and normalizing flows. The paper categorizes DDMs from multiple perspectives, including the task they are applied to, the input signals they require, and the underlying framework. It highlights the current limitations of DDMs, such as slow inference times, and suggests future research directions. The survey also presents various applications of DDMs in image generation, super-resolution, inpainting, image editing, and segmentation. The paper concludes with a detailed discussion of the key components of DDMs, including the forward and reverse processes, the training objectives, and the sampling methods.