6 Jun 2024 | Tim Salimans, Thomas Mensink, Jonathan Heek, Emiel Hoogeboom
The paper presents a novel method for accelerating the sampling process of diffusion models by distilling them into faster, few-step models. The method matches conditional expectations of clean data given noisy data along the sampling trajectory, extending one-step distillation methods to multi-step models. By using up to 8 sampling steps, the distilled models outperform both their one-step versions and the original multi-step teacher models, achieving state-of-the-art results on the ImageNet dataset. The approach is also applied to a large text-to-image model, demonstrating fast generation of high-resolution images without the need for autoencoders or upsamplers. The paper introduces two variants of the algorithm: one using alternating optimization of a distilled generator and an auxiliary denoising model, and another using two independent minibatches for parameter updates. The method is validated through experiments on ImageNet and a text-to-image model, showing improved sampling quality and efficiency.The paper presents a novel method for accelerating the sampling process of diffusion models by distilling them into faster, few-step models. The method matches conditional expectations of clean data given noisy data along the sampling trajectory, extending one-step distillation methods to multi-step models. By using up to 8 sampling steps, the distilled models outperform both their one-step versions and the original multi-step teacher models, achieving state-of-the-art results on the ImageNet dataset. The approach is also applied to a large text-to-image model, demonstrating fast generation of high-resolution images without the need for autoencoders or upsamplers. The paper introduces two variants of the algorithm: one using alternating optimization of a distilled generator and an auxiliary denoising model, and another using two independent minibatches for parameter updates. The method is validated through experiments on ImageNet and a text-to-image model, showing improved sampling quality and efficiency.