Understanding Consistency Models Made Easy

Consistency models (CMs) are a novel class of generative models that offer faster sampling compared to traditional diffusion models. However, CMs are resource-intensive to train, with state-of-the-art (SoTA) models requiring significant computational resources. This work proposes an efficient training scheme for CMs, termed Easy Consistency Tuning (ECT), which significantly reduces training costs while maintaining or improving sample quality. ECT achieves this by expressing CM trajectories through a differential equation and viewing diffusion models as a special case of CMs with a specific discretization. By fine-tuning a pre-trained diffusion model, ECT progressively approximates the full consistency condition, making the time discretization smaller as training progresses. This approach results in a streamlined training recipe that significantly reduces training time and computational resources. Experiments on datasets like CIFAR-10 and ImageNet 64×64 demonstrate that ECT achieves better 2-step sample quality than previous methods while using less than 2% of the training FLOPs. ECT also reduces inference costs by 1/1000 compared to pre-trained diffusion models while maintaining comparable generation quality. The scalability of ECT is further investigated, showing that it follows a classic power-law scaling, indicating its potential for large-scale applications. Overall, ECT provides a simple and principled approach to efficiently train CMs, unlocking state-of-the-art few-step generative capabilities with minimal tuning costs.Consistency models (CMs) are a novel class of generative models that offer faster sampling compared to traditional diffusion models. However, CMs are resource-intensive to train, with state-of-the-art (SoTA) models requiring significant computational resources. This work proposes an efficient training scheme for CMs, termed Easy Consistency Tuning (ECT), which significantly reduces training costs while maintaining or improving sample quality. ECT achieves this by expressing CM trajectories through a differential equation and viewing diffusion models as a special case of CMs with a specific discretization. By fine-tuning a pre-trained diffusion model, ECT progressively approximates the full consistency condition, making the time discretization smaller as training progresses. This approach results in a streamlined training recipe that significantly reduces training time and computational resources. Experiments on datasets like CIFAR-10 and ImageNet 64×64 demonstrate that ECT achieves better 2-step sample quality than previous methods while using less than 2% of the training FLOPs. ECT also reduces inference costs by 1/1000 compared to pre-trained diffusion models while maintaining comparable generation quality. The scalability of ECT is further investigated, showing that it follows a classic power-law scaling, indicating its potential for large-scale applications. Overall, ECT provides a simple and principled approach to efficiently train CMs, unlocking state-of-the-art few-step generative capabilities with minimal tuning costs.

Consistency Models Made Easy

20 Jun 2024 | Zhengyang Geng, Ashwini Pokle, William Luo, Justin Lin, J. Zico Kolter