27 May 2024 | Sirui Xie, Zhiheng Xiao, Diederik P. Kingma, Tingbo Hou, Ying Nian Wu, Kevin Murphy, Tim Salimans, Ben Poole, Ruiqi Gao
The paper introduces EM Distillation (EMD), a maximum likelihood-based approach to distill a pre-trained diffusion model into a one-step generator model, minimizing perceptual quality loss. EMD leverages the Expectation-Maximization (EM) framework, updating generator parameters using samples from the joint distribution of the diffusion teacher's prior and inferred generator latents. The authors develop a reparameterized sampling scheme and a noise cancellation technique to stabilize the distillation process. EMD outperforms existing one-step generative methods in terms of FID scores on ImageNet-64 and ImageNet-128, and performs well on text-to-image generation from Stable Diffusion models. The method is flexible, allowing for interpolation between mode-seeking and mode-covering divergences through different sampling schemes.The paper introduces EM Distillation (EMD), a maximum likelihood-based approach to distill a pre-trained diffusion model into a one-step generator model, minimizing perceptual quality loss. EMD leverages the Expectation-Maximization (EM) framework, updating generator parameters using samples from the joint distribution of the diffusion teacher's prior and inferred generator latents. The authors develop a reparameterized sampling scheme and a noise cancellation technique to stabilize the distillation process. EMD outperforms existing one-step generative methods in terms of FID scores on ImageNet-64 and ImageNet-128, and performs well on text-to-image generation from Stable Diffusion models. The method is flexible, allowing for interpolation between mode-seeking and mode-covering divergences through different sampling schemes.