Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

16 Jul 2024 | Ziyu Zhao, Leilei Gan, Guoyin Wang, Yuwei Hu, Tao Shen, Hongxia Yang, Fei Wu, Kun Kuang
RAMoLE is a retrieval-augmented mixture of LoRA experts framework designed for Uploadable Machine Learning (UML), which enables dynamic and scalable LoRA adaptation for diverse downstream tasks. The framework consists of three key components: (1) Input-aware LoRA retrieval, which identifies relevant LoRAs using sentence embeddings and instruction fine-tuning; (2) On-the-fly Mixture of LoRA Experts (MoLE), which dynamically assigns weights to retrieved LoRAs using a RouterLoRA and cross-attention mechanism; and (3) Batch inference, which efficiently processes multiple LoRAs for heterogeneous requests. RAMoLE outperforms existing baselines in mixed-task scenarios, demonstrating strong generalization capabilities for unseen LoRAs and tasks. The framework addresses the challenges of dynamically expanding LoRA pools and heterogeneous downstream requests by enabling flexible and adaptive LoRA routing. Experimental results show that RAMoLE achieves superior performance in both in-distribution (IID) and out-of-distribution (OOD) settings, with the RouterLoRA effectively differentiating between LoRAs and improving model performance. The framework also incorporates a novel batch inference strategy that enhances efficiency by leveraging LoRA mapping matrices. Overall, RAMoLE provides a flexible and scalable solution for UML, enabling personalized services for diverse downstream tasks.RAMoLE is a retrieval-augmented mixture of LoRA experts framework designed for Uploadable Machine Learning (UML), which enables dynamic and scalable LoRA adaptation for diverse downstream tasks. The framework consists of three key components: (1) Input-aware LoRA retrieval, which identifies relevant LoRAs using sentence embeddings and instruction fine-tuning; (2) On-the-fly Mixture of LoRA Experts (MoLE), which dynamically assigns weights to retrieved LoRAs using a RouterLoRA and cross-attention mechanism; and (3) Batch inference, which efficiently processes multiple LoRAs for heterogeneous requests. RAMoLE outperforms existing baselines in mixed-task scenarios, demonstrating strong generalization capabilities for unseen LoRAs and tasks. The framework addresses the challenges of dynamically expanding LoRA pools and heterogeneous downstream requests by enabling flexible and adaptive LoRA routing. Experimental results show that RAMoLE achieves superior performance in both in-distribution (IID) and out-of-distribution (OOD) settings, with the RouterLoRA effectively differentiating between LoRAs and improving model performance. The framework also incorporates a novel batch inference strategy that enhances efficiency by leveraging LoRA mapping matrices. Overall, RAMoLE provides a flexible and scalable solution for UML, enabling personalized services for diverse downstream tasks.
Reach us at info@study.space
Understanding Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning