[slides] Imagine Flash%3A Accelerating Emu Diffusion Models with Backward Distillation

The paper "Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation" introduces a novel distillation framework designed to enable high-fidelity, diverse sample generation using just one to three steps. The key contributions of the work are: 1. **Backward Distillation**: This technique calibrates the student model on its own backward trajectory, reducing the gap between training and inference distributions and ensuring zero data leakage during training across all time steps. 2. **Shifted Reconstruction Loss (SRL)**: This adaptive loss function dynamically adapts knowledge transfer from the teacher model, focusing on global structural information at high time steps and fine-grained details at lower time steps. 3. **Noise Correction**: This training-free inference modification enhances sample quality by addressing singularities in noise prediction, particularly in the initial sampling step. The authors demonstrate that their method outperforms existing competitors in both quantitative metrics and human evaluations, achieving performance comparable to the teacher model using only three denoising steps. The approach is evaluated on the Emu diffusion model, showing significant improvements in image quality and efficiency compared to other methods. The paper also includes extensive experiments and qualitative comparisons to highlight the effectiveness of the proposed techniques.The paper "Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation" introduces a novel distillation framework designed to enable high-fidelity, diverse sample generation using just one to three steps. The key contributions of the work are: 1. **Backward Distillation**: This technique calibrates the student model on its own backward trajectory, reducing the gap between training and inference distributions and ensuring zero data leakage during training across all time steps. 2. **Shifted Reconstruction Loss (SRL)**: This adaptive loss function dynamically adapts knowledge transfer from the teacher model, focusing on global structural information at high time steps and fine-grained details at lower time steps. 3. **Noise Correction**: This training-free inference modification enhances sample quality by addressing singularities in noise prediction, particularly in the initial sampling step. The authors demonstrate that their method outperforms existing competitors in both quantitative metrics and human evaluations, achieving performance comparable to the teacher model using only three denoising steps. The approach is evaluated on the Emu diffusion model, showing significant improvements in image quality and efficiency compared to other methods. The paper also includes extensive experiments and qualitative comparisons to highlight the effectiveness of the proposed techniques.

Imagine Flash: Accelerating Emu Diffusion Models with Backward Distillation

8 May 2024 | Jonas Kohler*, Albert Pumarola*, Edgar Schönfeld*, Artsiom Sanakoyeu*, Roshan Sumbaly, Peter Vajda, and Ali Thabet

8 May 2024 | Jonas Kohler, Albert Pumarola, Edgar Schönfeld, Artsiom Sanakoyeu, Roshan Sumbaly, Peter Vajda, and Ali Thabet