Prompt-to-Prompt Image Editing with Cross Attention Control

Prompt-to-Prompt Image Editing with Cross Attention Control

2 Aug 2022 | Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or
This paper introduces a novel *prompt-to-prompt* image editing framework that allows users to modify images using only textual prompts. The method leverages the cross-attention layers in text-conditioned diffusion models to control the spatial layout and geometry of the generated image. By injecting the cross-attention maps from the original image during the diffusion process, the framework can preserve the original composition and structure while editing the image based on the modified prompt. The authors demonstrate several applications, including localized editing by replacing words, global editing by adding specifications, and controlling the extent of word influence. The method is intuitive and does not require additional training or fine-tuning, making it accessible for users. Experiments show high-quality synthesis and fidelity to the edited prompts, even for diverse images and prompts. The paper also discusses limitations and future directions, such as improving inversion accuracy and enabling more precise localized editing.This paper introduces a novel *prompt-to-prompt* image editing framework that allows users to modify images using only textual prompts. The method leverages the cross-attention layers in text-conditioned diffusion models to control the spatial layout and geometry of the generated image. By injecting the cross-attention maps from the original image during the diffusion process, the framework can preserve the original composition and structure while editing the image based on the modified prompt. The authors demonstrate several applications, including localized editing by replacing words, global editing by adding specifications, and controlling the extent of word influence. The method is intuitive and does not require additional training or fine-tuning, making it accessible for users. Experiments show high-quality synthesis and fidelity to the edited prompts, even for diverse images and prompts. The paper also discusses limitations and future directions, such as improving inversion accuracy and enabling more precise localized editing.
Reach us at info@study.space