Understanding Multistep Consistency Models

This paper introduces Multistep Consistency Models, a novel approach that combines the strengths of Consistency Models and TRACT (Berthelot et al., 2023) to bridge the gap between standard diffusion models and low-step sampling methods. Consistency Models, while requiring fewer training steps, often sacrifice image quality. In contrast, diffusion models are more resource-intensive but produce higher-quality samples. Multistep Consistency Models split the diffusion process into multiple segments, allowing for a trade-off between sampling speed and quality. Specifically, a 1-step consistency model is equivalent to a standard diffusion model, while an infinite-step consistency model is a conventional consistency model. The authors propose a unified training algorithm that splits the diffusion process into predefined segments, with each segment trained using a separate consistency model sharing the same parameters. This approach simplifies the modeling task and significantly improves performance. They also introduce a deterministic sampler, Adjusted DDIM (aDDIM), which corrects for the variance loss in the DDIM sampler, leading to better sample quality. Experiments on ImageNet datasets demonstrate that Multistep Consistency Models achieve state-of-the-art FID scores with as few as 4 or 8 sampling steps, outperforming existing methods. The paper also shows that fine-tuning Multistep Consistency Models from pre-trained diffusion checkpoints can lead to faster and more stable convergence. Additionally, the authors evaluate their method on text-to-image models, achieving high-quality samples with minimal computational overhead. Overall, Multistep Consistency Models provide a practical and efficient solution for generating high-quality samples with reduced sampling time, making them a promising alternative to traditional diffusion models.This paper introduces Multistep Consistency Models, a novel approach that combines the strengths of Consistency Models and TRACT (Berthelot et al., 2023) to bridge the gap between standard diffusion models and low-step sampling methods. Consistency Models, while requiring fewer training steps, often sacrifice image quality. In contrast, diffusion models are more resource-intensive but produce higher-quality samples. Multistep Consistency Models split the diffusion process into multiple segments, allowing for a trade-off between sampling speed and quality. Specifically, a 1-step consistency model is equivalent to a standard diffusion model, while an infinite-step consistency model is a conventional consistency model. The authors propose a unified training algorithm that splits the diffusion process into predefined segments, with each segment trained using a separate consistency model sharing the same parameters. This approach simplifies the modeling task and significantly improves performance. They also introduce a deterministic sampler, Adjusted DDIM (aDDIM), which corrects for the variance loss in the DDIM sampler, leading to better sample quality. Experiments on ImageNet datasets demonstrate that Multistep Consistency Models achieve state-of-the-art FID scores with as few as 4 or 8 sampling steps, outperforming existing methods. The paper also shows that fine-tuning Multistep Consistency Models from pre-trained diffusion checkpoints can lead to faster and more stable convergence. Additionally, the authors evaluate their method on text-to-image models, achieving high-quality samples with minimal computational overhead. Overall, Multistep Consistency Models provide a practical and efficient solution for generating high-quality samples with reduced sampling time, making them a promising alternative to traditional diffusion models.

Multistep Consistency Models

2024 | Jonathan Heek, Emiel Hoogeboom, and Tim Salimans