Cascaded Diffusion Models for High Fidelity Image Generation

Cascaded Diffusion Models for High Fidelity Image Generation

17 Dec 2021 | Jonathan Ho*, Chitwan Saharia*, William Chan, David J. Fleet, Mohammad Norouzi, Tim Salimans
Cascaded diffusion models generate high-fidelity images on the ImageNet benchmark without auxiliary classifiers. The approach involves a pipeline of diffusion models that progressively increase resolution, starting with a base model and followed by super-resolution models. Conditioning augmentation, a data augmentation technique applied to lower-resolution inputs, is crucial for maintaining sample quality in cascading pipelines. Experiments show that conditioning augmentation prevents error compounding during sampling, achieving FID scores of 1.48 at 64×64, 3.52 at 128×128, and 4.88 at 256×256, outperforming BigGAN-deep and VQ-VAE-2. The method also achieves high classification accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256×256. Conditioning augmentation is effective because it mitigates train-test mismatch errors in cascading pipelines. The paper introduces Gaussian noise and blurring as key augmentation techniques for low and high-resolution upsampling, respectively. It also explores truncated and non-truncated conditioning augmentation, finding that non-truncated methods are more practical. The cascading pipeline is shown to be effective on ImageNet and LSUN datasets, demonstrating the general applicability of conditioning augmentation. The study highlights the effectiveness of cascading diffusion models in generating high-quality images without classifier guidance, emphasizing the importance of conditioning augmentation in improving sample quality.Cascaded diffusion models generate high-fidelity images on the ImageNet benchmark without auxiliary classifiers. The approach involves a pipeline of diffusion models that progressively increase resolution, starting with a base model and followed by super-resolution models. Conditioning augmentation, a data augmentation technique applied to lower-resolution inputs, is crucial for maintaining sample quality in cascading pipelines. Experiments show that conditioning augmentation prevents error compounding during sampling, achieving FID scores of 1.48 at 64×64, 3.52 at 128×128, and 4.88 at 256×256, outperforming BigGAN-deep and VQ-VAE-2. The method also achieves high classification accuracy scores of 63.02% (top-1) and 84.06% (top-5) at 256×256. Conditioning augmentation is effective because it mitigates train-test mismatch errors in cascading pipelines. The paper introduces Gaussian noise and blurring as key augmentation techniques for low and high-resolution upsampling, respectively. It also explores truncated and non-truncated conditioning augmentation, finding that non-truncated methods are more practical. The cascading pipeline is shown to be effective on ImageNet and LSUN datasets, demonstrating the general applicability of conditioning augmentation. The study highlights the effectiveness of cascading diffusion models in generating high-quality images without classifier guidance, emphasizing the importance of conditioning augmentation in improving sample quality.
Reach us at info@study.space
[slides and audio] Cascaded Diffusion Models for High Fidelity Image Generation