Understanding LoRA%2B%3A Efficient Low Rank Adaptation of Large Models

This paper addresses the inefficiency of Low Rank Adaptation (LoRA) in fine-tuning large models with high embedding dimensions. LoRA, which uses low-rank matrices to adapt pre-trained models, often fails to achieve optimal performance due to the same learning rate being used for both adapter matrices \(A\) and \(B\). The authors demonstrate that this approach does not allow for efficient feature learning in large-width networks. They propose a new method called LoRA+, which sets different learning rates for \(A\) and \(B\), with \(\eta_B\) being much larger than \(\eta_A\). This correction improves both the performance (1%–2% improvements) and the speed of fine-tuning (up to 2X speedup) while maintaining the same computational cost as standard LoRA. Extensive experiments on various language models and tasks validate the effectiveness of LoRA+. The paper also provides theoretical insights into the scaling of neural networks, showing that efficient feature learning requires specific learning rate scaling in the infinite-width limit.This paper addresses the inefficiency of Low Rank Adaptation (LoRA) in fine-tuning large models with high embedding dimensions. LoRA, which uses low-rank matrices to adapt pre-trained models, often fails to achieve optimal performance due to the same learning rate being used for both adapter matrices \(A\) and \(B\). The authors demonstrate that this approach does not allow for efficient feature learning in large-width networks. They propose a new method called LoRA+, which sets different learning rates for \(A\) and \(B\), with \(\eta_B\) being much larger than \(\eta_A\). This correction improves both the performance (1%–2% improvements) and the speed of fine-tuning (up to 2X speedup) while maintaining the same computational cost as standard LoRA. Extensive experiments on various language models and tasks validate the effectiveness of LoRA+. The paper also provides theoretical insights into the scaling of neural networks, showing that efficient feature learning requires specific learning rate scaling in the infinite-width limit.

LoRA+: Efficient Low Rank Adaptation of Large Models

2024 | Soufiane Hayou * 1 Nikhil Ghosh * 2 Bin Yu 2