19 Apr 2024 | Xi Wang, Nicolas Dufour, Nefeli Andreou, Marie-Paule Cani, Victoria Fernández Abrevaya, David Picard, and Vicky Kalogeiton
This paper analyzes Classifier-Free Guidance (CFG) weight schedulers for diffusion models, focusing on how different scheduling strategies affect image generation quality. The study compares static, heuristic, and parameterized dynamic guidance schedulers. Static guidance produces sharp but simplistic images, while low static guidance results in fuzzy but detailed images. Dynamic guidance, which varies the weight over time, achieves a better balance between image fidelity and condition adherence.
The paper finds that simple, monotonically increasing weight schedulers consistently improve performance, requiring no additional computational cost or tuning. More complex parameterized schedulers can further enhance results but do not generalize across different models and tasks. The study also shows that heuristic schedulers, such as linear and cosine, outperform static guidance in terms of image fidelity, diversity, and textual adherence. Parameterized schedulers like clamp-linear and pcs also improve performance but require careful tuning for specific models and tasks.
The analysis is supported by quantitative results, including improvements in FID (Fréchet Inception Distance) and Inception Score (IS), as well as qualitative results showing better image details, diversity, and text alignment. User studies confirm that images generated with dynamic schedulers are preferred over those with static guidance. The paper concludes that dynamic guidance schedulers provide a more effective balance between image quality and condition adherence, but require careful tuning for different models and tasks.This paper analyzes Classifier-Free Guidance (CFG) weight schedulers for diffusion models, focusing on how different scheduling strategies affect image generation quality. The study compares static, heuristic, and parameterized dynamic guidance schedulers. Static guidance produces sharp but simplistic images, while low static guidance results in fuzzy but detailed images. Dynamic guidance, which varies the weight over time, achieves a better balance between image fidelity and condition adherence.
The paper finds that simple, monotonically increasing weight schedulers consistently improve performance, requiring no additional computational cost or tuning. More complex parameterized schedulers can further enhance results but do not generalize across different models and tasks. The study also shows that heuristic schedulers, such as linear and cosine, outperform static guidance in terms of image fidelity, diversity, and textual adherence. Parameterized schedulers like clamp-linear and pcs also improve performance but require careful tuning for specific models and tasks.
The analysis is supported by quantitative results, including improvements in FID (Fréchet Inception Distance) and Inception Score (IS), as well as qualitative results showing better image details, diversity, and text alignment. User studies confirm that images generated with dynamic schedulers are preferred over those with static guidance. The paper concludes that dynamic guidance schedulers provide a more effective balance between image quality and condition adherence, but require careful tuning for different models and tasks.