Multi-LoRA Composition for Image Generation

Multi-LoRA Composition for Image Generation

26 Feb 2024 | Ming Zhong, Yelong Shen, Shuhang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen
This paper explores the composition of multiple Low-Rank Adaptation (LoRA) modules in text-to-image models to enhance the generation of complex and personalized images. Traditional methods often struggle with the integration of multiple LoRAs, leading to instability and poor image quality. To address this, the authors propose two training-free methods: LORA SWITCH and LORA COMPOSITE. LORA SWITCH alternates between different LoRAs at each denoising step, while LORA COMPOSITE incorporates all LoRAs simultaneously to guide the image synthesis process. These methods are evaluated using ComposLoRA, a comprehensive testbed featuring 480 composition sets with 6 categories of LoRAs and 2 visual styles (realistic and anime). The evaluation is conducted using GPT-4V, which assesses both image quality and composition quality. The results show that both LORA SWITCH and LORA COMPOSITE outperform the baseline LoRA MERGE approach, particularly as the number of LoRAs increases. Human evaluations further validate the effectiveness of these methods, confirming their superior performance in composing complex images. The study provides a new benchmark for evaluating LoRA-based composable image generation and highlights the importance of maintaining LoRA weights during the denoising process to ensure stable and high-quality image synthesis.This paper explores the composition of multiple Low-Rank Adaptation (LoRA) modules in text-to-image models to enhance the generation of complex and personalized images. Traditional methods often struggle with the integration of multiple LoRAs, leading to instability and poor image quality. To address this, the authors propose two training-free methods: LORA SWITCH and LORA COMPOSITE. LORA SWITCH alternates between different LoRAs at each denoising step, while LORA COMPOSITE incorporates all LoRAs simultaneously to guide the image synthesis process. These methods are evaluated using ComposLoRA, a comprehensive testbed featuring 480 composition sets with 6 categories of LoRAs and 2 visual styles (realistic and anime). The evaluation is conducted using GPT-4V, which assesses both image quality and composition quality. The results show that both LORA SWITCH and LORA COMPOSITE outperform the baseline LoRA MERGE approach, particularly as the number of LoRAs increases. Human evaluations further validate the effectiveness of these methods, confirming their superior performance in composing complex images. The study provides a new benchmark for evaluating LoRA-based composable image generation and highlights the importance of maintaining LoRA weights during the denoising process to ensure stable and high-quality image synthesis.
Reach us at info@study.space
[slides and audio] Multi-LoRA Composition for Image Generation