Improved Distribution Matching Distillation for Fast Image Synthesis

Improved Distribution Matching Distillation for Fast Image Synthesis

24 May 2024 | Tianwei Yin, Michaël Gharbi, Taesung Park, Richard Zhang, Eli Shechtman, Frédo Durand, William T. Freeman
This paper introduces DMD2, an improved distribution matching distillation method for fast image synthesis. DMD2 addresses the limitations of the original DMD approach by eliminating the need for a regression loss, which was computationally expensive and limited the student model's performance. The key improvements include: (1) removing the regression loss and using a two-time-scale update rule to stabilize training; (2) integrating a GAN loss to train the student model on real data, improving image quality; and (3) enabling multi-step sampling in the student model, addressing the training-inference mismatch. These improvements allow DMD2 to achieve state-of-the-art results in one-step image generation, with FID scores of 1.28 on ImageNet-64×64 and 8.35 on zero-shot COCO 2014, surpassing the original teacher model despite a 500× reduction in inference cost. Additionally, DMD2 can generate megapixel images by distilling SDXL, demonstrating exceptional visual quality among few-step methods. The paper also shows that DMD2 can be extended to support multi-step generators, further improving performance. The method is evaluated on several benchmarks, including class-conditional image generation on ImageNet-64×64 and text-to-image synthesis on COCO 2014. The results show that DMD2 outperforms previous work, even rivaling the performance of the teacher model. The paper also discusses the limitations of the method, including a slight degradation in image diversity compared to the teacher model and the need for four steps to match the quality of the largest SDXL model. The authors also mention potential societal impacts of their work, including both positive and negative effects. The paper is supported by various funding sources and acknowledges the contributions of several individuals and organizations.This paper introduces DMD2, an improved distribution matching distillation method for fast image synthesis. DMD2 addresses the limitations of the original DMD approach by eliminating the need for a regression loss, which was computationally expensive and limited the student model's performance. The key improvements include: (1) removing the regression loss and using a two-time-scale update rule to stabilize training; (2) integrating a GAN loss to train the student model on real data, improving image quality; and (3) enabling multi-step sampling in the student model, addressing the training-inference mismatch. These improvements allow DMD2 to achieve state-of-the-art results in one-step image generation, with FID scores of 1.28 on ImageNet-64×64 and 8.35 on zero-shot COCO 2014, surpassing the original teacher model despite a 500× reduction in inference cost. Additionally, DMD2 can generate megapixel images by distilling SDXL, demonstrating exceptional visual quality among few-step methods. The paper also shows that DMD2 can be extended to support multi-step generators, further improving performance. The method is evaluated on several benchmarks, including class-conditional image generation on ImageNet-64×64 and text-to-image synthesis on COCO 2014. The results show that DMD2 outperforms previous work, even rivaling the performance of the teacher model. The paper also discusses the limitations of the method, including a slight degradation in image diversity compared to the teacher model and the need for four steps to match the quality of the largest SDXL model. The authors also mention potential societal impacts of their work, including both positive and negative effects. The paper is supported by various funding sources and acknowledges the contributions of several individuals and organizations.
Reach us at info@study.space