February 13, 2024 | Gen Li, Zhihan Huang, Yuting Wei
This paper presents a theoretical analysis of consistency training in diffusion models. Consistency models aim to generate samples close to the target distribution by learning a function that maps any point in the diffusion process to its starting point. While empirical results show that consistency models achieve state-of-the-art performance, a comprehensive theoretical understanding of their effectiveness remains limited. The paper provides a non-asymptotic convergence theory for consistency training, showing that the number of steps required to generate samples within ε proximity to the target distribution (measured by Wasserstein metric) is on the order of d^(5/2)/ε, where d is the data dimension. This theory provides rigorous insights into the validity and efficacy of consistency models, highlighting their utility in downstream inference tasks. The paper also discusses the flexibility and versatility of consistency models, which only require enforcing self-consistency conditions. The main contributions include establishing theoretical underpinnings for consistency models, focusing on consistency training, and showing that a sequence of functions can be learned to enable one-shot sampling with desirable fidelity. The paper also introduces notation and preliminary concepts, including diffusion-based generative models and consistency models. The forward process in diffusion models involves progressively perturbing data into noise, while the reverse process generates samples from noise. The paper analyzes the role of consistency enforcement in preserving sampling fidelity and provides theoretical guarantees for consistency training.This paper presents a theoretical analysis of consistency training in diffusion models. Consistency models aim to generate samples close to the target distribution by learning a function that maps any point in the diffusion process to its starting point. While empirical results show that consistency models achieve state-of-the-art performance, a comprehensive theoretical understanding of their effectiveness remains limited. The paper provides a non-asymptotic convergence theory for consistency training, showing that the number of steps required to generate samples within ε proximity to the target distribution (measured by Wasserstein metric) is on the order of d^(5/2)/ε, where d is the data dimension. This theory provides rigorous insights into the validity and efficacy of consistency models, highlighting their utility in downstream inference tasks. The paper also discusses the flexibility and versatility of consistency models, which only require enforcing self-consistency conditions. The main contributions include establishing theoretical underpinnings for consistency models, focusing on consistency training, and showing that a sequence of functions can be learned to enable one-shot sampling with desirable fidelity. The paper also introduces notation and preliminary concepts, including diffusion-based generative models and consistency models. The forward process in diffusion models involves progressively perturbing data into noise, while the reverse process generates samples from noise. The paper analyzes the role of consistency enforcement in preserving sampling fidelity and provides theoretical guarantees for consistency training.