PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher

29 Oct 2024 | Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Yuhta Takida, Naoki Murata, Toshimitsu Uesaka, Yuki Mitsufuji, Stefano Ermon
PaGoDA is a novel pipeline that reduces the training cost of diffusion models by using three stages: pretraining on downsampled data, distilling the pretrained diffusion model into a one-step generator, and progressive super-resolution. The pipeline reduces training costs by 64× when training on 8× downsampled data. PaGoDA achieves state-of-the-art performance on ImageNet across all resolutions from 64×64 to 512×512, and for text-to-image generation. The pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models. PaGoDA's training pipeline includes a reconstruction loss and adversarial loss to ensure stable training and high-quality generation. The pipeline also incorporates classifier-free guidance for text-to-image generation, allowing for controllable generation. PaGoDA's performance is comparable to other diffusion models, with a strong focus on high-resolution generation and scalability. The pipeline is efficient and practical, making it a promising solution for scalable diffusion model training across various computational settings.PaGoDA is a novel pipeline that reduces the training cost of diffusion models by using three stages: pretraining on downsampled data, distilling the pretrained diffusion model into a one-step generator, and progressive super-resolution. The pipeline reduces training costs by 64× when training on 8× downsampled data. PaGoDA achieves state-of-the-art performance on ImageNet across all resolutions from 64×64 to 512×512, and for text-to-image generation. The pipeline can be applied directly in the latent space, adding compression alongside the pre-trained autoencoder in Latent Diffusion Models. PaGoDA's training pipeline includes a reconstruction loss and adversarial loss to ensure stable training and high-quality generation. The pipeline also incorporates classifier-free guidance for text-to-image generation, allowing for controllable generation. PaGoDA's performance is comparable to other diffusion models, with a strong focus on high-resolution generation and scalability. The pipeline is efficient and practical, making it a promising solution for scalable diffusion model training across various computational settings.
Reach us at info@study.space
[slides] PaGoDA%3A Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher | StudySpace