Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing

Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing

6 Mar 2024 | Bingyan Liu, Chengyu Wang, Tingfeng Cao, Kui Jia, Jun Huang
This paper investigates the roles of cross-attention and self-attention mechanisms in Stable Diffusion for text-guided image editing. The authors find that cross-attention maps often contain object attribution information, which can lead to editing failures, while self-attention maps are crucial for preserving the geometric and shape details of the source image. Based on these findings, they propose a simplified, tuning-free method called Free-Prompt-Editing (FPE) that modifies only the self-attention maps of specified attention layers during the denoising process. Experimental results show that FPE consistently outperforms existing methods on multiple datasets. The study provides valuable insights into understanding cross and self-attention mechanisms in diffusion models and offers a practical solution for overcoming the limitations of inaccurate text-guided image editing.This paper investigates the roles of cross-attention and self-attention mechanisms in Stable Diffusion for text-guided image editing. The authors find that cross-attention maps often contain object attribution information, which can lead to editing failures, while self-attention maps are crucial for preserving the geometric and shape details of the source image. Based on these findings, they propose a simplified, tuning-free method called Free-Prompt-Editing (FPE) that modifies only the self-attention maps of specified attention layers during the denoising process. Experimental results show that FPE consistently outperforms existing methods on multiple datasets. The study provides valuable insights into understanding cross and self-attention mechanisms in diffusion models and offers a practical solution for overcoming the limitations of inaccurate text-guided image editing.
Reach us at info@futurestudyspace.com
Understanding Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing