Mixture of LoRA Experts

Mixture of LoRA Experts

21 Apr 2024 | Xun Wu, Shaohan Huang, Furu Wei
This paper introduces MoLE, a novel method for dynamically and efficiently composing multiple trained LoRAs while preserving their individual characteristics. MoLE treats each layer of trained LoRAs as a distinct expert and uses a learnable gating function to determine optimal composition weights based on a specified domain objective. This hierarchical weight control allows MoLE to enhance desirable characteristics while mitigating less favorable ones, leading to more effective LoRA composition. MoLE also maintains flexibility in composing multiple trained LoRAs with reduced computational costs. During training, MoLE learns the gating function for multiple trained LoRAs while keeping other parameters frozen, resulting in minimal computational costs. During inference, MoLE has two modes: one where all trained LoRAs are used with learned gating weights, and another where unwanted LoRAs can be manually masked, and weights are recalculated and distributed proportionally. MoLE is validated in both NLP and V&L domains, showing superior performance compared to existing LoRA composition methods. The results demonstrate that MoLE outperforms existing methods in terms of text and image alignment, and it effectively retains the characteristics of individual LoRAs. MoLE also shows robustness and superior composition capabilities as the number of LoRAs increases. The method is flexible and can be adapted to different scenarios, providing a versatile and effective approach for LoRA composition.This paper introduces MoLE, a novel method for dynamically and efficiently composing multiple trained LoRAs while preserving their individual characteristics. MoLE treats each layer of trained LoRAs as a distinct expert and uses a learnable gating function to determine optimal composition weights based on a specified domain objective. This hierarchical weight control allows MoLE to enhance desirable characteristics while mitigating less favorable ones, leading to more effective LoRA composition. MoLE also maintains flexibility in composing multiple trained LoRAs with reduced computational costs. During training, MoLE learns the gating function for multiple trained LoRAs while keeping other parameters frozen, resulting in minimal computational costs. During inference, MoLE has two modes: one where all trained LoRAs are used with learned gating weights, and another where unwanted LoRAs can be manually masked, and weights are recalculated and distributed proportionally. MoLE is validated in both NLP and V&L domains, showing superior performance compared to existing LoRA composition methods. The results demonstrate that MoLE outperforms existing methods in terms of text and image alignment, and it effectively retains the characteristics of individual LoRAs. MoLE also shows robustness and superior composition capabilities as the number of LoRAs increases. The method is flexible and can be adapted to different scenarios, providing a versatile and effective approach for LoRA composition.
Reach us at info@study.space