Understanding Multimodal Instruction Tuning with Conditional Mixture of LoRA

The paper introduces a novel approach called Conditional Mixture-of-LoRA (MixLoRA) to address task interference in parameter-efficient multimodal instruction tuning. Task interference, where the adaptation of one task negatively impacts another, is a significant challenge in fine-tuning large language models (LLMs) for diverse multimodal tasks. MixLoRA dynamically constructs low-rank adaptation matrices tailored to each input instance, mitigating task interference. The method uses a Dynamic Factor Selection module to select appropriate factors from an expanded pool, ensuring that the adaptation matrices are instance-specific and coherent. Experimental results on various multimodal datasets demonstrate that MixLoRA outperforms conventional LoRA, both in terms of performance and robustness across diverse tasks. The effectiveness of MixLoRA is attributed to its dynamic factor selection mechanism, which allows for adaptive factor activation and generalization to unseen tasks.The paper introduces a novel approach called Conditional Mixture-of-LoRA (MixLoRA) to address task interference in parameter-efficient multimodal instruction tuning. Task interference, where the adaptation of one task negatively impacts another, is a significant challenge in fine-tuning large language models (LLMs) for diverse multimodal tasks. MixLoRA dynamically constructs low-rank adaptation matrices tailored to each input instance, mitigating task interference. The method uses a Dynamic Factor Selection module to select appropriate factors from an expanded pool, ensuring that the adaptation matrices are instance-specific and coherent. Experimental results on various multimodal datasets demonstrate that MixLoRA outperforms conventional LoRA, both in terms of performance and robustness across diverse tasks. The effectiveness of MixLoRA is attributed to its dynamic factor selection mechanism, which allows for adaptive factor activation and generalization to unseen tasks.

Multimodal Instruction Tuning with Conditional Mixture of LoRA

24 Feb 2024 | Qifan Wang, Ying Shen, Zhiyang Xu, Yu Cheng, Wenpeng Yin, Lifu Huang