[slides and audio] Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

This paper introduces Retrieval-Augmented Mixture of LoRA Experts (RAMoLE), a framework designed for Uploadable Machine Learning (UML) that dynamically retrieves and composes multiple LoRAs based on input prompts. UML involves edge-side contributors training domain-specific LoRAs, which are then uploaded to a central platform to provide personalized services for heterogeneous downstream tasks. Previous approaches to LoRA composition have focused on static or isolated tasks, failing to address the dynamic and expanding nature of LoRAs in UML. RAMoLE addresses this by incorporating three key components: (1) Input-aware LoRA Retrieval, which uses sentence embeddings and instruction fine-tuning to identify relevant LoRAs, (2) On-the-fly Mixture of LoRA Experts (MoLE), which dynamically assigns weights to retrieved LoRAs using an attention mechanism, and (3) Batch Inference of Multiple LoRAs, which enables efficient processing of heterogeneous requests. Experiments show that RAMoLE outperforms existing methods, demonstrating its effectiveness and scalability in mixed-task scenarios. The framework also exhibits strong generalization capabilities, as it can effectively identify appropriate LoRAs for unseen tasks even when trained on a limited subset of tasks.This paper introduces Retrieval-Augmented Mixture of LoRA Experts (RAMoLE), a framework designed for Uploadable Machine Learning (UML) that dynamically retrieves and composes multiple LoRAs based on input prompts. UML involves edge-side contributors training domain-specific LoRAs, which are then uploaded to a central platform to provide personalized services for heterogeneous downstream tasks. Previous approaches to LoRA composition have focused on static or isolated tasks, failing to address the dynamic and expanding nature of LoRAs in UML. RAMoLE addresses this by incorporating three key components: (1) Input-aware LoRA Retrieval, which uses sentence embeddings and instruction fine-tuning to identify relevant LoRAs, (2) On-the-fly Mixture of LoRA Experts (MoLE), which dynamically assigns weights to retrieved LoRAs using an attention mechanism, and (3) Batch Inference of Multiple LoRAs, which enables efficient processing of heterogeneous requests. Experiments show that RAMoLE outperforms existing methods, demonstrating its effectiveness and scalability in mixed-task scenarios. The framework also exhibits strong generalization capabilities, as it can effectively identify appropriate LoRAs for unseen tasks even when trained on a limited subset of tasks.

Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

16 Jul 2024 | Ziyu Zhao, Leilei Gan, Guoyin Wang, Yuwei Hu, Tao Shen, Hongxia Yang, Member, IEEE, Fei Wu, Senior Member, IEEE, Kun Kuang