18 Mar 2024 | Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, Robin Rombach
The paper introduces Latent Adversarial Diffusion Distillation (LADD), a novel distillation approach that overcomes the limitations of existing methods like Adversarial Diffusion Distillation (ADD). LADD leverages generative features from pretrained latent diffusion models, simplifying training and enhancing performance for high-resolution multi-aspect image synthesis. The authors apply LADD to Stable Diffusion 3 (SD3) to create SD3-Turbo, a fast model that matches the performance of state-of-the-art text-to-image generators using only four unguided sampling steps. The paper also investigates the scaling behavior of SD3-Turbo and demonstrates its effectiveness in various applications such as image editing and inpainting. Key contributions include the introduction of LADD, the use of synthetic data for training, and the comparison with other distillation approaches. The results show that SD3-Turbo outperforms or matches the performance of state-of-the-art models in both image quality and prompt alignment, while achieving faster inference speeds.The paper introduces Latent Adversarial Diffusion Distillation (LADD), a novel distillation approach that overcomes the limitations of existing methods like Adversarial Diffusion Distillation (ADD). LADD leverages generative features from pretrained latent diffusion models, simplifying training and enhancing performance for high-resolution multi-aspect image synthesis. The authors apply LADD to Stable Diffusion 3 (SD3) to create SD3-Turbo, a fast model that matches the performance of state-of-the-art text-to-image generators using only four unguided sampling steps. The paper also investigates the scaling behavior of SD3-Turbo and demonstrates its effectiveness in various applications such as image editing and inpainting. Key contributions include the introduction of LADD, the use of synthetic data for training, and the comparison with other distillation approaches. The results show that SD3-Turbo outperforms or matches the performance of state-of-the-art models in both image quality and prompt alignment, while achieving faster inference speeds.