[slides and audio] MC2%3A Multi-concept Guidance for Customized Multi-concept Generation

MC² (Multi-Concept Guidance for Customized Multi-concept Generation) is a novel method designed to improve the flexibility and fidelity of customized multi-concept text-to-image generation. It addresses the limitations of existing methods, which often struggle with the combination of multiple concepts and may result in a mix of characteristics from different models. MC² decouples the requirements for model architecture by optimizing inference time, allowing the integration of various heterogeneous single-concept customized models. It adaptively refines attention weights between visual and textual tokens, ensuring that image regions focus on their associated words while diminishing the impact of irrelevant ones. Extensive experiments demonstrate that MC² outperforms previous methods in terms of consistency with input prompts and reference images, even without requiring additional training. The method can also enhance the compositional capabilities of existing text-to-image diffusion models, yielding appealing results. The code for MC² is publicly available at <https://github.com/JIANGJiaXiu/MC-2>.MC² (Multi-Concept Guidance for Customized Multi-concept Generation) is a novel method designed to improve the flexibility and fidelity of customized multi-concept text-to-image generation. It addresses the limitations of existing methods, which often struggle with the combination of multiple concepts and may result in a mix of characteristics from different models. MC² decouples the requirements for model architecture by optimizing inference time, allowing the integration of various heterogeneous single-concept customized models. It adaptively refines attention weights between visual and textual tokens, ensuring that image regions focus on their associated words while diminishing the impact of irrelevant ones. Extensive experiments demonstrate that MC² outperforms previous methods in terms of consistency with input prompts and reference images, even without requiring additional training. The method can also enhance the compositional capabilities of existing text-to-image diffusion models, yielding appealing results. The code for MC² is publicly available at <https://github.com/JIANGJiaXiu/MC-2>.

MC²: Multi-concept Guidance for Customized Multi-concept Generation

12 Apr 2024 | Jiaxiu Jiang, Yabo Zhang, Kailai Feng, Xiaohe Wu, and Wangmeng Zuo