[slides and audio] Visual Style Prompting with Swapping Self-Attention

The paper introduces a novel approach called Visual Style Prompting, which aims to generate images with specific styles while maintaining the content specified by text prompts. This method addresses the challenge of controlled generation in text-to-image diffusion models (T2I DMs) by using a reference image as a visual prompt. The key idea is to swap the key and value features in the late self-attention layers of the denoising process with those from the reference image, ensuring that the generated images reflect the desired style without content leakage. The approach does not require fine-tuning and can be applied to various T2I DMs. Extensive evaluations show that Visual Style Prompting outperforms existing methods in terms of style similarity, text alignment, content diversity, and content leakage. The method is also compatible with existing techniques like ControlNet and Dreambooth-LoRA, and can handle real images as references. The paper concludes by discussing limitations and future directions, emphasizing the need for ethical considerations in the use of generative models.The paper introduces a novel approach called Visual Style Prompting, which aims to generate images with specific styles while maintaining the content specified by text prompts. This method addresses the challenge of controlled generation in text-to-image diffusion models (T2I DMs) by using a reference image as a visual prompt. The key idea is to swap the key and value features in the late self-attention layers of the denoising process with those from the reference image, ensuring that the generated images reflect the desired style without content leakage. The approach does not require fine-tuning and can be applied to various T2I DMs. Extensive evaluations show that Visual Style Prompting outperforms existing methods in terms of style similarity, text alignment, content diversity, and content leakage. The method is also compatible with existing techniques like ControlNet and Dreambooth-LoRA, and can handle real images as references. The paper concludes by discussing limitations and future directions, emphasizing the need for ethical considerations in the use of generative models.

Visual Style Prompting with Swapping Self-Attention

2024 | Jaeseok Jeong, Junho Kim, Yunjey Choi, Gayoung Lee, Youngjung Uh