PaGoDA 🏰: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

PaGoDA 🏰: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

29 Oct 2024 | Dongjun Kim*,†, Chieh-Hsin Lai*, Wei-Hsiang Liao, Yuhua Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon
PaGoDA (Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher) is a novel pipeline designed to reduce the computational costs of training diffusion models, particularly in high-dimensional content generation. The pipeline consists of three stages: pretraining a diffusion model on downsampled data, distilling the pre-trained model into a one-step generator, and progressively upscaling the generator for higher resolutions. By training on downsampled data, PaGoDA reduces the computational budget by a factor of 64× compared to training on full-resolution data. The distillation stage uses a reconstruction loss to map latent representations back to real data, ensuring better alignment with the real data distribution. The upscaling stage trains additional networks to enhance resolution, maintaining stability and quality. PaGoDA achieves state-of-the-art performance on ImageNet across various resolutions and demonstrates effectiveness in text-to-image generation, achieving competitive results without the need for additional stabilization techniques. The code for PaGoDA is available at <https://github.com/sony/pagoda>.PaGoDA (Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher) is a novel pipeline designed to reduce the computational costs of training diffusion models, particularly in high-dimensional content generation. The pipeline consists of three stages: pretraining a diffusion model on downsampled data, distilling the pre-trained model into a one-step generator, and progressively upscaling the generator for higher resolutions. By training on downsampled data, PaGoDA reduces the computational budget by a factor of 64× compared to training on full-resolution data. The distillation stage uses a reconstruction loss to map latent representations back to real data, ensuring better alignment with the real data distribution. The upscaling stage trains additional networks to enhance resolution, maintaining stability and quality. PaGoDA achieves state-of-the-art performance on ImageNet across various resolutions and demonstrates effectiveness in text-to-image generation, achieving competitive results without the need for additional stabilization techniques. The code for PaGoDA is available at <https://github.com/sony/pagoda>.
Reach us at info@study.space
Understanding PaGoDA%3A Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher