February 13, 2024 | Gen Li, Zhihan Huang, Yuting Wei
This paper addresses the theoretical underpinnings of consistency training in diffusion models, which are designed to improve sampling efficiency by learning a sequence of functions that map any point at any time step of the diffusion process back to its starting point. The authors demonstrate that, to generate samples within ε proximity to the target distribution in terms of the Wasserstein metric, the number of steps in consistency training should exceed the order of \( d^{5/2} / ε \), where \( d \) is the data dimension. This theoretical framework provides rigorous insights into the validity and efficacy of consistency models, particularly in downstream inference tasks. The paper also discusses the assumptions and setup for the analysis, including the Lipschitz continuity of mappings and the approximation error of the function class. The main result is established under these assumptions, showing that the Wasserstein distance between the sampled distribution and the target distribution is bounded by a function of the Lipschitz constant, the dimension, and the desired error ε. The authors conclude by highlighting potential future directions, such as optimizing the dependencies on the Lipschitz constant and ambient dimension, and comparing their theory with other generative sampling methods.This paper addresses the theoretical underpinnings of consistency training in diffusion models, which are designed to improve sampling efficiency by learning a sequence of functions that map any point at any time step of the diffusion process back to its starting point. The authors demonstrate that, to generate samples within ε proximity to the target distribution in terms of the Wasserstein metric, the number of steps in consistency training should exceed the order of \( d^{5/2} / ε \), where \( d \) is the data dimension. This theoretical framework provides rigorous insights into the validity and efficacy of consistency models, particularly in downstream inference tasks. The paper also discusses the assumptions and setup for the analysis, including the Lipschitz continuity of mappings and the approximation error of the function class. The main result is established under these assumptions, showing that the Wasserstein distance between the sampled distribution and the target distribution is bounded by a function of the Lipschitz constant, the dimension, and the desired error ε. The authors conclude by highlighting potential future directions, such as optimizing the dependencies on the Lipschitz constant and ambient dimension, and comparing their theory with other generative sampling methods.