LoRA+: Efficient Low Rank Adaptation of Large Models

LoRA+: Efficient Low Rank Adaptation of Large Models

2024 | Soufiane Hayou, Nikhil Ghosh, Bin Yu
This paper introduces LoRA+ as an improved version of the Low-Rank Adaptation (LoRA) method for fine-tuning large pre-trained language models. LoRA+ addresses the suboptimal performance of standard LoRA by setting different learning rates for the adapter matrices A and B, with a well-chosen fixed ratio. This adjustment leads to improved performance (1–2% improvements) and faster fine-tuning (up to 2X speedup) without increasing computational cost. The paper analyzes the limitations of standard LoRA, particularly in large-width networks, where the same learning rate for A and B leads to inefficient feature learning. Using scaling arguments and theoretical analysis, the authors show that setting different learning rates for A and B in LoRA+ allows for more efficient feature learning. This is validated through extensive experiments on various language models and tasks, including GLUE tasks, GPT-2, RoBERTa, and Llama. The results demonstrate that LoRA+ outperforms standard LoRA in terms of both performance and training speed. The optimal learning rate ratio for LoRA+ is found to be significantly larger for certain tasks and models, with the ratio depending on the specific architecture and task. The paper also highlights that while LoRA+ improves performance, the optimal ratio can vary, and further research is needed to determine the best ratio for different scenarios. The study concludes that LoRA+ is a more efficient and effective method for fine-tuning large language models, particularly for challenging tasks. The findings contribute to the understanding of how to optimize the learning rates in LoRA and provide practical guidance for improving the performance of fine-tuned models.This paper introduces LoRA+ as an improved version of the Low-Rank Adaptation (LoRA) method for fine-tuning large pre-trained language models. LoRA+ addresses the suboptimal performance of standard LoRA by setting different learning rates for the adapter matrices A and B, with a well-chosen fixed ratio. This adjustment leads to improved performance (1–2% improvements) and faster fine-tuning (up to 2X speedup) without increasing computational cost. The paper analyzes the limitations of standard LoRA, particularly in large-width networks, where the same learning rate for A and B leads to inefficient feature learning. Using scaling arguments and theoretical analysis, the authors show that setting different learning rates for A and B in LoRA+ allows for more efficient feature learning. This is validated through extensive experiments on various language models and tasks, including GLUE tasks, GPT-2, RoBERTa, and Llama. The results demonstrate that LoRA+ outperforms standard LoRA in terms of both performance and training speed. The optimal learning rate ratio for LoRA+ is found to be significantly larger for certain tasks and models, with the ratio depending on the specific architecture and task. The paper also highlights that while LoRA+ improves performance, the optimal ratio can vary, and further research is needed to determine the best ratio for different scenarios. The study concludes that LoRA+ is a more efficient and effective method for fine-tuning large language models, particularly for challenging tasks. The findings contribute to the understanding of how to optimize the learning rates in LoRA and provide practical guidance for improving the performance of fine-tuned models.
Reach us at info@study.space