Understanding Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning

MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning **Authors:** Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Jiahuan Pei **Institutions:** Shandong University, Leiden University, University of Amsterdam, Centrum Wiskunde & Informatica **Abstract:** Parameter-efficient fine-tuning (PEFT) is a popular method for tailoring large language models (LLMs) to specific tasks, especially as models scale and task diversity increases. Low-rank adaptation (LoRA) is a PEFT method that uses low-rank matrices to approximate weight updates, reducing the number of trainable parameters. However, LoRA often suffers from generalization errors compared to full-parameter fine-tuning. MELoRA, a novel method, addresses this issue by stacking multiple mini LoRAs in parallel, each with a small number of parameters, while maintaining a higher rank. This approach captures diversity among mini LoRAs, enhancing generalization. Theoretical analysis and empirical studies on various NLP tasks show that MELoRA achieves better performance with significantly fewer trainable parameters compared to LoRA, demonstrating its effectiveness. **Key Contributions:** - MELoRA: A new method that uses mini-ensemble low-rank adapters to achieve higher ranks with fewer parameters. - Theoretical demonstration that MELoRA maintains a higher and flexible rank, with lower complexity. - Extensive experiments show that MELoRA outperforms LoRA in terms of parameter quantity and performance. **Introduction:** Large language models (LLMs) are widely used in natural language processing (NLP). Fine-tuning (FT) is a common method to tailor LLMs for specific tasks, but full FT becomes infeasible with large models and diverse tasks. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, reduce the number of trainable parameters while maintaining computational efficiency. However, LoRA often leads to performance gaps compared to full FT. MELoRA addresses this by stacking multiple mini LoRAs in parallel, ensuring a higher rank without increasing the number of parameters. **Methodology:** MELoRA involves concatenating the outputs from several mini LoRA modules. Each mini LoRA has a small rank, and the final rank is the sum of the ranks of all mini LoRAs. This ensures a higher rank with fewer parameters. The method is theoretically analyzed and experimentally validated on various NLP tasks. **Experimental Setup:** MELoRA is compared with LoRA and other state-of-the-art variants on the GLUE and IN-STRUCTVAL datasets. Results show that MELoRA outperforms LoRA with significantly fewer parameters, especially on instruction-following tasks. **Results:** - MELoRA achieves superiorMELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning **Authors:** Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Jiahuan Pei **Institutions:** Shandong University, Leiden University, University of Amsterdam, Centrum Wiskunde & Informatica **Abstract:** Parameter-efficient fine-tuning (PEFT) is a popular method for tailoring large language models (LLMs) to specific tasks, especially as models scale and task diversity increases. Low-rank adaptation (LoRA) is a PEFT method that uses low-rank matrices to approximate weight updates, reducing the number of trainable parameters. However, LoRA often suffers from generalization errors compared to full-parameter fine-tuning. MELoRA, a novel method, addresses this issue by stacking multiple mini LoRAs in parallel, each with a small number of parameters, while maintaining a higher rank. This approach captures diversity among mini LoRAs, enhancing generalization. Theoretical analysis and empirical studies on various NLP tasks show that MELoRA achieves better performance with significantly fewer trainable parameters compared to LoRA, demonstrating its effectiveness. **Key Contributions:** - MELoRA: A new method that uses mini-ensemble low-rank adapters to achieve higher ranks with fewer parameters. - Theoretical demonstration that MELoRA maintains a higher and flexible rank, with lower complexity. - Extensive experiments show that MELoRA outperforms LoRA in terms of parameter quantity and performance. **Introduction:** Large language models (LLMs) are widely used in natural language processing (NLP). Fine-tuning (FT) is a common method to tailor LLMs for specific tasks, but full FT becomes infeasible with large models and diverse tasks. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, reduce the number of trainable parameters while maintaining computational efficiency. However, LoRA often leads to performance gaps compared to full FT. MELoRA addresses this by stacking multiple mini LoRAs in parallel, ensuring a higher rank without increasing the number of parameters. **Methodology:** MELoRA involves concatenating the outputs from several mini LoRA modules. Each mini LoRA has a small rank, and the final rank is the sum of the ranks of all mini LoRAs. This ensures a higher rank with fewer parameters. The method is theoretically analyzed and experimentally validated on various NLP tasks. **Experimental Setup:** MELoRA is compared with LoRA and other state-of-the-art variants on the GLUE and IN-STRUCTVAL datasets. Results show that MELoRA outperforms LoRA with significantly fewer parameters, especially on instruction-following tasks. **Results:** - MELoRA achieves superior

MELoRA: Mini-Ensemble Low-Rank Adapters for Parameter-Efficient Fine-Tuning

24 Jun 2024 | Pengjie Ren, Chengshun Shi, Shiguang Wu, Mengqi Zhang, Zhaochun Ren, Maarten de Rijke, Zhumin Chen, Jiahuan Pei