CFG++: Manifold-Constrained Classifier Free Guidance for Diffusion Models

CFG++: Manifold-Constrained Classifier Free Guidance for Diffusion Models

12 Jun 2024 | Hyungjin Chung*, Jeongsol Kim*, Geon Yeong Park*, Hyelin Nam*, Jong Chul Ye
The paper "CFG++: Manifold-Constrained Classifier Free Guidance for Diffusion Models" addresses the limitations of Classifier-Free Guidance (CFG) in text-guided generation, particularly the off-manifold phenomenon and mode collapse. CFG++ introduces a geometric correction by reformulating text-guidance as an optimization problem using a text-conditioned score matching loss. This approach ensures that the sampling process remains within the clean data manifold, avoiding extrapolation and improving the quality of generated images. CFG++ also enhances the invertibility of DDIM inversion, enabling better image editing and reconstruction. Experimental results show that CFG++ outperforms CFG in text-to-image generation, DDIM inversion, and text-conditioned inverse problems, demonstrating its effectiveness in maintaining text alignment and reducing artifacts. The method is particularly useful for applications requiring accurate latent denoised estimates, such as diffusion inverse solvers.The paper "CFG++: Manifold-Constrained Classifier Free Guidance for Diffusion Models" addresses the limitations of Classifier-Free Guidance (CFG) in text-guided generation, particularly the off-manifold phenomenon and mode collapse. CFG++ introduces a geometric correction by reformulating text-guidance as an optimization problem using a text-conditioned score matching loss. This approach ensures that the sampling process remains within the clean data manifold, avoiding extrapolation and improving the quality of generated images. CFG++ also enhances the invertibility of DDIM inversion, enabling better image editing and reconstruction. Experimental results show that CFG++ outperforms CFG in text-to-image generation, DDIM inversion, and text-conditioned inverse problems, demonstrating its effectiveness in maintaining text alignment and reducing artifacts. The method is particularly useful for applications requiring accurate latent denoised estimates, such as diffusion inverse solvers.
Reach us at info@study.space