LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs

LLaVA-MoLE: Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs

30 Jan 2024 | Shaoxiang Chen, Zequn Jie, Lin Ma
This paper introduces LLaVA-MoLE, a sparse mixture of LoRA experts (MoLE) designed to mitigate data conflicts in instruction finetuning of multimodal large language models (MLLMs). The authors identify that mixing instruction data from different domains can lead to performance drops on specific tasks due to data conflicts. To address this, they propose a MoE design that extends the LoRA method by creating a set of LoRA experts for the MLP layer, with each token routed to the top-1 expert based on a routing function. This allows for adaptive choices for tokens from different domains while keeping training and inference costs roughly constant compared to the original LoRA method. Experiments show that LLaVA-MoLE effectively mitigates data conflicts and achieves consistent performance gains over plain-LoRA baselines, even outperforming models trained with twice the samples. The approach enables efficient handling of diverse instruction datasets without significant computational overhead.This paper introduces LLaVA-MoLE, a sparse mixture of LoRA experts (MoLE) designed to mitigate data conflicts in instruction finetuning of multimodal large language models (MLLMs). The authors identify that mixing instruction data from different domains can lead to performance drops on specific tasks due to data conflicts. To address this, they propose a MoE design that extends the LoRA method by creating a set of LoRA experts for the MLP layer, with each token routed to the top-1 expert based on a routing function. This allows for adaptive choices for tokens from different domains while keeping training and inference costs roughly constant compared to the original LoRA method. Experiments show that LLaVA-MoLE effectively mitigates data conflicts and achieves consistent performance gains over plain-LoRA baselines, even outperforming models trained with twice the samples. The approach enables efficient handling of diverse instruction datasets without significant computational overhead.
Reach us at info@study.space
[slides] LLaVA-MoLE%3A Sparse Mixture of LoRA Experts for Mitigating Data Conflicts in Instruction Finetuning MLLMs | StudySpace