Cascaded Diffusion Models for High Fidelity Image Generation

Cascaded Diffusion Models for High Fidelity Image Generation

17 Dec 2021 | Jonathan Ho*, Chitwan Saharia*, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans
The paper introduces a novel approach called Cascaded Diffusion Models (CDMs) to generate high-fidelity images on the class-conditional ImageNet generation benchmark. CDMs consist of a pipeline of multiple diffusion models that generate images of increasing resolution, starting with a standard diffusion model at the lowest resolution and followed by one or more super-resolution models that upsample the image and add higher-resolution details. The key contribution is the use of *conditioning augmentation*, a technique to enhance the conditioning inputs of super-resolution models during training. Conditioning augmentation prevents compounding error in cascading pipelines, leading to improved sample quality. The authors find that this technique is crucial for achieving high-quality samples at the highest resolution. Their experiments show that CDMs achieve FID scores of 1.48 at 64×64, 3.52 at 128×128, and 4.88 at 256×256, outperforming BigGAN-deep. Additionally, the models achieve classification accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256×256, surpassing VQ-VAE-2. The paper also explores different types of conditioning augmentation, including Gaussian augmentation and Gaussian blurring, and provides detailed hyperparameters and architectures for the models.The paper introduces a novel approach called Cascaded Diffusion Models (CDMs) to generate high-fidelity images on the class-conditional ImageNet generation benchmark. CDMs consist of a pipeline of multiple diffusion models that generate images of increasing resolution, starting with a standard diffusion model at the lowest resolution and followed by one or more super-resolution models that upsample the image and add higher-resolution details. The key contribution is the use of *conditioning augmentation*, a technique to enhance the conditioning inputs of super-resolution models during training. Conditioning augmentation prevents compounding error in cascading pipelines, leading to improved sample quality. The authors find that this technique is crucial for achieving high-quality samples at the highest resolution. Their experiments show that CDMs achieve FID scores of 1.48 at 64×64, 3.52 at 128×128, and 4.88 at 256×256, outperforming BigGAN-deep. Additionally, the models achieve classification accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256×256, surpassing VQ-VAE-2. The paper also explores different types of conditioning augmentation, including Gaussian augmentation and Gaussian blurring, and provides detailed hyperparameters and architectures for the models.
Reach us at info@study.space