22 May 2024 | Yuxi Ren, Xin Xia, Yanzuo Lu, Jiacheng Zhang, Xing Wang, Jie Wu, Xuefeng Xiao*
Hyper-SD is a trajectory segmented consistency model designed for efficient image synthesis. It combines the advantages of ODE trajectory preservation and reformulation to maintain near-lossless performance during step compression. The model introduces Trajectory Segmented Consistency Distillation (TSCD), which divides time steps into segments to ensure consistency within each segment, improving the preservation of the original ODE trajectory. Additionally, it incorporates human feedback learning to enhance model performance in low-step regimes and mitigate performance loss from distillation. Score distillation is also used to improve low-step generation capabilities and enable a unified LoRA for inference at all steps. Extensive experiments show that Hyper-SD achieves state-of-the-art performance for both SDXL and SD1.5 across 1 to 8 inference steps. For example, Hyper-SDXL outperforms SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in 1-step inference. The model also demonstrates compatibility with ControlNet and various base models, and its unified LoRA enables efficient inference with different step counts. Hyper-SD is open-sourced, providing LoRAs for SDXL and SD1.5 from 1 to 8 steps and a dedicated one-step SDXL model to advance generative AI. The method shows promising results in generating high-quality images with few inference steps, though there are areas for further improvement, such as retaining negative cues and exploring diffusion transformer architectures.Hyper-SD is a trajectory segmented consistency model designed for efficient image synthesis. It combines the advantages of ODE trajectory preservation and reformulation to maintain near-lossless performance during step compression. The model introduces Trajectory Segmented Consistency Distillation (TSCD), which divides time steps into segments to ensure consistency within each segment, improving the preservation of the original ODE trajectory. Additionally, it incorporates human feedback learning to enhance model performance in low-step regimes and mitigate performance loss from distillation. Score distillation is also used to improve low-step generation capabilities and enable a unified LoRA for inference at all steps. Extensive experiments show that Hyper-SD achieves state-of-the-art performance for both SDXL and SD1.5 across 1 to 8 inference steps. For example, Hyper-SDXL outperforms SDXL-Lightning by +0.68 in CLIP Score and +0.51 in Aes Score in 1-step inference. The model also demonstrates compatibility with ControlNet and various base models, and its unified LoRA enables efficient inference with different step counts. Hyper-SD is open-sourced, providing LoRAs for SDXL and SD1.5 from 1 to 8 steps and a dedicated one-step SDXL model to advance generative AI. The method shows promising results in generating high-quality images with few inference steps, though there are areas for further improvement, such as retaining negative cues and exploring diffusion transformer architectures.