8 May 2024 | Jonas Kohler*, Albert Pumarola*, Edgar Schönfeld*, Artsiom Sanakoyeu*, Roshan Sumbaly, Peter Vajda, and Ali Thabet
Imagine Flash is a novel distillation framework that accelerates diffusion models for text-to-image generation. The method reduces inference time while maintaining high-quality image generation using only 1-3 steps. The key components include Backward Distillation, which calibrates the student model on its own backward trajectory to reduce training-inference discrepancies; Shifted Reconstruction Loss, which dynamically adapts knowledge transfer based on the current time step; and Noise Correction, which enhances sample quality by addressing singularities in noise prediction. The framework achieves performance comparable to the teacher model using just three denoising steps, enabling efficient high-quality generation. The method outperforms existing competitors in quantitative metrics and human evaluations, demonstrating effective trade-offs between sampling efficiency and generation quality. The approach is validated through extensive experiments and human evaluations, showing superior results in realism, sharpness, and detail compared to state-of-the-art methods. The work highlights the potential for ultra-efficient generative modeling and opens avenues for future research in other modalities and applications.Imagine Flash is a novel distillation framework that accelerates diffusion models for text-to-image generation. The method reduces inference time while maintaining high-quality image generation using only 1-3 steps. The key components include Backward Distillation, which calibrates the student model on its own backward trajectory to reduce training-inference discrepancies; Shifted Reconstruction Loss, which dynamically adapts knowledge transfer based on the current time step; and Noise Correction, which enhances sample quality by addressing singularities in noise prediction. The framework achieves performance comparable to the teacher model using just three denoising steps, enabling efficient high-quality generation. The method outperforms existing competitors in quantitative metrics and human evaluations, demonstrating effective trade-offs between sampling efficiency and generation quality. The approach is validated through extensive experiments and human evaluations, showing superior results in realism, sharpness, and detail compared to state-of-the-art methods. The work highlights the potential for ultra-efficient generative modeling and opens avenues for future research in other modalities and applications.