AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts

AdaMoLE: Fine-Tuning Large Language Models with Adaptive Mixture of Low-Rank Adaptation Experts

10 Aug 2024 | Zefang Liu, Jiahua Luo
AdaMoLE is a novel method for fine-tuning large language models (LLMs) by integrating Low-Rank Adaptation (LoRA) with an adaptive Mixture of Experts (MoE) framework. Unlike traditional methods that use a static top-k strategy for activating experts, AdaMoLE dynamically adjusts the activation threshold using a dedicated threshold network, adapting to the varying complexities of different tasks. By replacing a single LoRA in a layer with multiple LoRA experts and integrating a gating function with the threshold mechanism, AdaMoLE effectively selects and activates the most appropriate experts based on the input context. Extensive evaluations across various commonsense reasoning and natural language processing (NLP) tasks show that AdaMoLE outperforms baseline models, highlighting the advantages of its adaptive selection of LoRA experts. The experimental validation confirms AdaMoLE as a robust approach for enhancing LLMs and suggests valuable directions for future research in adaptive expert selection mechanisms, potentially broadening the scope for optimizing model performance across diverse language processing tasks.AdaMoLE is a novel method for fine-tuning large language models (LLMs) by integrating Low-Rank Adaptation (LoRA) with an adaptive Mixture of Experts (MoE) framework. Unlike traditional methods that use a static top-k strategy for activating experts, AdaMoLE dynamically adjusts the activation threshold using a dedicated threshold network, adapting to the varying complexities of different tasks. By replacing a single LoRA in a layer with multiple LoRA experts and integrating a gating function with the threshold mechanism, AdaMoLE effectively selects and activates the most appropriate experts based on the input context. Extensive evaluations across various commonsense reasoning and natural language processing (NLP) tasks show that AdaMoLE outperforms baseline models, highlighting the advantages of its adaptive selection of LoRA experts. The experimental validation confirms AdaMoLE as a robust approach for enhancing LLMs and suggests valuable directions for future research in adaptive expert selection mechanisms, potentially broadening the scope for optimizing model performance across diverse language processing tasks.
Reach us at info@study.space