26 Mar 2024 | Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kim
Perturbed-Attention Guidance (PAG) is a novel sampling guidance method for diffusion models that enhances sample quality in both conditional and unconditional settings without requiring additional training or external modules. PAG leverages the self-attention mechanism in diffusion U-Net to generate intermediate samples with degraded structure by substituting selected self-attention maps with an identity matrix. This process guides the denoising process away from these degraded samples, improving the overall quality of generated images. PAG has been tested on various diffusion models, including ADM and Stable Diffusion, and has shown significant improvements in sample quality and performance in downstream tasks such as image restoration and ControlNet. PAG also enhances the performance of diffusion models in tasks where existing guidance methods like classifier guidance (CFG) are not applicable. The method is effective in both unconditional and conditional generation scenarios, and its results demonstrate that it can significantly improve the quality of generated images while maintaining diversity. PAG is also effective in tasks such as inverse problems, image restoration, and text-to-image synthesis. The method is implemented by modifying the self-attention maps in the diffusion U-Net and using a perturbed self-attention (PSA) module to generate the final noise prediction. PAG has been shown to outperform existing guidance methods in terms of sample quality and diversity, and it is a promising approach for improving the performance of diffusion models in various applications.Perturbed-Attention Guidance (PAG) is a novel sampling guidance method for diffusion models that enhances sample quality in both conditional and unconditional settings without requiring additional training or external modules. PAG leverages the self-attention mechanism in diffusion U-Net to generate intermediate samples with degraded structure by substituting selected self-attention maps with an identity matrix. This process guides the denoising process away from these degraded samples, improving the overall quality of generated images. PAG has been tested on various diffusion models, including ADM and Stable Diffusion, and has shown significant improvements in sample quality and performance in downstream tasks such as image restoration and ControlNet. PAG also enhances the performance of diffusion models in tasks where existing guidance methods like classifier guidance (CFG) are not applicable. The method is effective in both unconditional and conditional generation scenarios, and its results demonstrate that it can significantly improve the quality of generated images while maintaining diversity. PAG is also effective in tasks such as inverse problems, image restoration, and text-to-image synthesis. The method is implemented by modifying the self-attention maps in the diffusion U-Net and using a perturbed self-attention (PSA) module to generate the final noise prediction. PAG has been shown to outperform existing guidance methods in terms of sample quality and diversity, and it is a promising approach for improving the performance of diffusion models in various applications.