Understanding Multi-LoRA Composition for Image Generation

This paper introduces two training-free methods for multi-LoRA composition in image generation: LoRA SWITCH and LoRA COMPOSITE. These methods address the limitations of existing LoRA merging techniques by focusing on the decoding process rather than weight manipulation. LoRA SWITCH alternates between different LoRAs at each denoising step, while LoRA COMPOSITE integrates all LoRAs simultaneously to guide image synthesis. The proposed methods are evaluated using a new testbed called ComposLoRA, which includes 480 composition sets across six LoRA categories. The results show that both methods outperform the prevalent LoRA merging approach, particularly as the number of LoRAs increases. The study also highlights the effectiveness of using GPT-4V as an evaluator for image generation tasks. The findings demonstrate that the proposed methods achieve superior performance in both composition quality and image quality, with LoRA SWITCH excelling in composition and LoRA COMPOSITE in image quality. The paper also discusses the potential biases of using GPT-4V as an evaluator and the importance of considering positional bias in comparative evaluations. Overall, the study contributes to the field of image generation by introducing a decoding-centric approach to multi-LoRA composition, offering a new standard for evaluating LoRA-based composable image generation.This paper introduces two training-free methods for multi-LoRA composition in image generation: LoRA SWITCH and LoRA COMPOSITE. These methods address the limitations of existing LoRA merging techniques by focusing on the decoding process rather than weight manipulation. LoRA SWITCH alternates between different LoRAs at each denoising step, while LoRA COMPOSITE integrates all LoRAs simultaneously to guide image synthesis. The proposed methods are evaluated using a new testbed called ComposLoRA, which includes 480 composition sets across six LoRA categories. The results show that both methods outperform the prevalent LoRA merging approach, particularly as the number of LoRAs increases. The study also highlights the effectiveness of using GPT-4V as an evaluator for image generation tasks. The findings demonstrate that the proposed methods achieve superior performance in both composition quality and image quality, with LoRA SWITCH excelling in composition and LoRA COMPOSITE in image quality. The paper also discusses the potential biases of using GPT-4V as an evaluator and the importance of considering positional bias in comparative evaluations. Overall, the study contributes to the field of image generation by introducing a decoding-centric approach to multi-LoRA composition, offering a new standard for evaluating LoRA-based composable image generation.

Multi-LoRA Composition for Image Generation

26 Feb 2024 | Ming Zhong¹, Yelong Shen², Shuohang Wang², Yadong Lu², Yizhu Jiao¹, Siru Ouyang¹, Donghan Yu², Jiawei Han¹, Weizhu Chen²