[slides] FreeCustom%3A Tuning-Free Customized Image Generation for Multi-Concept Composition

FreeCustom is a novel tuning-free method for generating customized images with multi-concept composition. The method uses a single image per concept as input and introduces a multi-reference self-attention (MRSA) mechanism and a weighted mask strategy to enable the generated image to access and focus on the reference concepts. The MRSA mechanism allows the model to effectively capture the global context of the input images, while the weighted mask strategy helps highlight the input concepts and eliminate irrelevant information. The method is designed to be simple and efficient, requiring no training or fine-tuning, and can be applied to various diffusion-based models. Experiments show that FreeCustom outperforms or performs on par with other training-based methods in terms of multi-concept composition and single-concept customization. The method is robust and effective in generating high-quality images across diverse concepts, and it can be easily applied to other diffusion-based models. The key contributions of FreeCustom include the introduction of the MRSA mechanism and the weighted mask strategy, which allow the generated image to interact with and focus more on the input concepts. The method also emphasizes the significance of context interaction in multi-concept composition, which is crucial for generating high-fidelity customized images. The results demonstrate that FreeCustom can produce images that are consistent with the given concepts and better aligned with the input text, making it a promising approach for customized image generation.FreeCustom is a novel tuning-free method for generating customized images with multi-concept composition. The method uses a single image per concept as input and introduces a multi-reference self-attention (MRSA) mechanism and a weighted mask strategy to enable the generated image to access and focus on the reference concepts. The MRSA mechanism allows the model to effectively capture the global context of the input images, while the weighted mask strategy helps highlight the input concepts and eliminate irrelevant information. The method is designed to be simple and efficient, requiring no training or fine-tuning, and can be applied to various diffusion-based models. Experiments show that FreeCustom outperforms or performs on par with other training-based methods in terms of multi-concept composition and single-concept customization. The method is robust and effective in generating high-quality images across diverse concepts, and it can be easily applied to other diffusion-based models. The key contributions of FreeCustom include the introduction of the MRSA mechanism and the weighted mask strategy, which allow the generated image to interact with and focus more on the input concepts. The method also emphasizes the significance of context interaction in multi-concept composition, which is crucial for generating high-fidelity customized images. The results demonstrate that FreeCustom can produce images that are consistent with the given concepts and better aligned with the input text, making it a promising approach for customized image generation.

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

22 May 2024 | Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen