LoRA Learns Less and Forgets Less

LoRA Learns Less and Forgets Less

15 May 2024 | Dan Biderman, Jose Gonzalez Ortiz, Jacob Portes, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, Cody Blakeney, John P. Cunningham
Low-Rank Adaptation (LoRA) is a parameter-efficient method for fine-tuning large language models, which trains low-rank perturbations to selected weight matrices. This study compares LoRA and full fine-tuning on two challenging domains—programming and mathematics—using instruction finetuning and continued pretraining data regimes. The results show that LoRA generally underperforms full fine-tuning in terms of accuracy and sample efficiency, particularly in programming tasks. However, LoRA exhibits better regularization properties, maintaining the base model's performance on tasks outside the target domain and showing stronger regularization compared to common techniques like weight decay and dropout. LoRA also helps maintain more diverse generations. The study finds that full fine-tuning learns higher-rank perturbations, possibly explaining the performance gap. The paper concludes by proposing best practices for using LoRA, emphasizing the importance of learning rates, target modules, and rank selection.Low-Rank Adaptation (LoRA) is a parameter-efficient method for fine-tuning large language models, which trains low-rank perturbations to selected weight matrices. This study compares LoRA and full fine-tuning on two challenging domains—programming and mathematics—using instruction finetuning and continued pretraining data regimes. The results show that LoRA generally underperforms full fine-tuning in terms of accuracy and sample efficiency, particularly in programming tasks. However, LoRA exhibits better regularization properties, maintaining the base model's performance on tasks outside the target domain and showing stronger regularization compared to common techniques like weight decay and dropout. LoRA also helps maintain more diverse generations. The study finds that full fine-tuning learns higher-rank perturbations, possibly explaining the performance gap. The paper concludes by proposing best practices for using LoRA, emphasizing the importance of learning rates, target modules, and rank selection.
Reach us at info@study.space