[slides and audio] Hyper-SD%3A Trajectory Segmented Consistency Model for Efficient Image Synthesis

Hyper-SD is a novel framework designed to efficiently synthesize images using diffusion models (DMs) by synergistically combining trajectory-preserving and trajectory-reformulating distillation techniques. The main contributions of Hyper-SD include: 1. **Trajectory Segmented Consistency Distillation (TSCD)**: This method progressively performs consistent distillation within predefined time-step segments, preserving the original ODE trajectory from a higher-order perspective. It divides the time steps into segments and enforces consistency within each segment, gradually reducing the number of segments to achieve all-time consistency. 2. **Human Feedback Learning**: This technique leverages human feedback to optimize the accelerated model, modifying the ODE trajectories to better suit few-step inference. It enhances the model's performance in the low-step regime and mitigates the performance loss incurred by the distillation process. 3. **Score Distillation**: This approach further improves the one-step generation capability of the model by using score-based distribution matching distillation. It achieves idealized all-time consistency via a unified Low-Rank Adaptation (LoRA) technique, supporting inference at all steps. Experiments demonstrate that Hyper-SD achieves state-of-the-art (SOTA) performance in low-step inference for both SDXL and SD1.5, outperforming other methods in terms of image quality, style, and text-to-image alignment. The framework is open-sourced, and the authors provide LoRAs for SDXL and SD15 from 1 to 8 steps inference, along with a dedicated one-step SDXL model.Hyper-SD is a novel framework designed to efficiently synthesize images using diffusion models (DMs) by synergistically combining trajectory-preserving and trajectory-reformulating distillation techniques. The main contributions of Hyper-SD include: 1. **Trajectory Segmented Consistency Distillation (TSCD)**: This method progressively performs consistent distillation within predefined time-step segments, preserving the original ODE trajectory from a higher-order perspective. It divides the time steps into segments and enforces consistency within each segment, gradually reducing the number of segments to achieve all-time consistency. 2. **Human Feedback Learning**: This technique leverages human feedback to optimize the accelerated model, modifying the ODE trajectories to better suit few-step inference. It enhances the model's performance in the low-step regime and mitigates the performance loss incurred by the distillation process. 3. **Score Distillation**: This approach further improves the one-step generation capability of the model by using score-based distribution matching distillation. It achieves idealized all-time consistency via a unified Low-Rank Adaptation (LoRA) technique, supporting inference at all steps. Experiments demonstrate that Hyper-SD achieves state-of-the-art (SOTA) performance in low-step inference for both SDXL and SD1.5, outperforming other methods in terms of image quality, style, and text-to-image alignment. The framework is open-sourced, and the authors provide LoRAs for SDXL and SD15 from 1 to 8 steps inference, along with a dedicated one-step SDXL model.

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

22 May 2024 | Yuxi Ren Xin Xia Yanzuo Lu Jiacheng Zhang Jie Wu Pan Xie Xing Wang Xuefeng Xiao