SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions

17 Apr 2024 | Yuda Song, Zehao Sun, Xuanwu Yin
SDXS is a real-time one-step latent diffusion model with image conditions. The paper introduces SDXS-512 and SDXS-1024, which achieve inference speeds of approximately 100 FPS and 30 FPS on a single GPU, respectively, significantly faster than existing models like SD v1.5 and SDXL. The method reduces model latency by combining model miniaturization and reduced sampling steps. It leverages knowledge distillation to streamline the U-Net and image decoder architectures and introduces a novel one-step training technique using feature matching and score distillation. The models are trained to generate images with high quality and efficiency, and they can also be adapted for image-conditioned control, enabling efficient image-to-image translation. The paper also explores the integration of ControlNet into the one-step model, allowing for image-conditioned generation. The proposed method achieves significant improvements in inference speed and efficiency while maintaining high-quality image generation. The results show that SDXS outperforms existing models in terms of speed and quality, and it can be applied to various image generation tasks, including text-to-image generation, image editing, inpainting, and super-resolution. The paper also discusses the challenges of deploying diffusion models on low-power devices and proposes solutions to overcome these challenges. The proposed method demonstrates the potential of diffusion models for real-time image generation on edge devices.SDXS is a real-time one-step latent diffusion model with image conditions. The paper introduces SDXS-512 and SDXS-1024, which achieve inference speeds of approximately 100 FPS and 30 FPS on a single GPU, respectively, significantly faster than existing models like SD v1.5 and SDXL. The method reduces model latency by combining model miniaturization and reduced sampling steps. It leverages knowledge distillation to streamline the U-Net and image decoder architectures and introduces a novel one-step training technique using feature matching and score distillation. The models are trained to generate images with high quality and efficiency, and they can also be adapted for image-conditioned control, enabling efficient image-to-image translation. The paper also explores the integration of ControlNet into the one-step model, allowing for image-conditioned generation. The proposed method achieves significant improvements in inference speed and efficiency while maintaining high-quality image generation. The results show that SDXS outperforms existing models in terms of speed and quality, and it can be applied to various image generation tasks, including text-to-image generation, image editing, inpainting, and super-resolution. The paper also discusses the challenges of deploying diffusion models on low-power devices and proposes solutions to overcome these challenges. The proposed method demonstrates the potential of diffusion models for real-time image generation on edge devices.
Reach us at info@study.space