[slides and audio] MixLoRA%3A Enhancing Large Language Models Fine-Tuning with LoRA based Mixture of Experts

MixLORA is a parameter-efficient method that combines LoRA (Low-Rank Adaptation) with Mixture-of-Experts (MoE) to enhance the fine-tuning of Large Language Models (LLMs). It constructs multiple LoRA-based experts within a frozen pre-trained dense model's feed-forward network block and employs a top-k router to assign tokens to different experts. Unlike other LoRA-based MoE methods, MixLORA uses independent attention-layer LoRA adapters and an auxiliary load balance loss to address expert imbalance. The method significantly improves performance in multi-task learning scenarios, achieving a 9% accuracy boost compared to state-of-the-art PEFT methods. Additionally, a high-throughput framework is proposed to optimize the computation process, reducing GPU memory consumption by 40% and token computation latency by 30% during training and inference. Evaluations on various benchmarks demonstrate that MixLORA outperforms LoRA and DoRA in both single-task and multi-task learning, with average accuracy improvements of 5.8% and 9.8%, respectively.MixLORA is a parameter-efficient method that combines LoRA (Low-Rank Adaptation) with Mixture-of-Experts (MoE) to enhance the fine-tuning of Large Language Models (LLMs). It constructs multiple LoRA-based experts within a frozen pre-trained dense model's feed-forward network block and employs a top-k router to assign tokens to different experts. Unlike other LoRA-based MoE methods, MixLORA uses independent attention-layer LoRA adapters and an auxiliary load balance loss to address expert imbalance. The method significantly improves performance in multi-task learning scenarios, achieving a 9% accuracy boost compared to state-of-the-art PEFT methods. Additionally, a high-throughput framework is proposed to optimize the computation process, reducing GPU memory consumption by 40% and token computation latency by 30% during training and inference. Evaluations on various benchmarks demonstrate that MixLORA outperforms LoRA and DoRA in both single-task and multi-task learning, with average accuracy improvements of 5.8% and 9.8%, respectively.

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

20 Jul 2024 | Dengchun Li, Yingzi Ma, Naizheng Wang, Zhengmao Ye, Zhiyuan Cheng1, Yinghao Tang, Yan Zhang3, Lei Duan, Jie Zuo*, Cal Yang2, and Mingjie Tang

MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts

20 Jul 2024 | Dengchun Li*, Yingzi Ma*, Naizheng Wang*, Zhengmao Ye*, Zhiyuan Cheng1, Yinghao Tang*, Yan Zhang3, Lei Duan*, Jie Zuo*, Cal Yang2, and Mingjie Tang

20 Jul 2024 | Dengchun Li, Yingzi Ma, Naizheng Wang, Zhengmao Ye, Zhiyuan Cheng1, Yinghao Tang, Yan Zhang3, Lei Duan, Jie Zuo*, Cal Yang2, and Mingjie Tang