LoRA: Low-Rank Adaptation of Large Language Models

LoRA: Low-Rank Adaptation of Large Language Models

16 Oct 2021 | Edward Hu*, Yelong Shen*, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
LoRA (Low-Rank Adaptation) is a parameter-efficient method for adapting large language models (LLMs) to specific tasks without retraining all parameters. It freezes the pre-trained weights and introduces low-rank matrices into the Transformer architecture, significantly reducing the number of trainable parameters. LoRA achieves performance comparable to full fine-tuning on tasks like RoBERTa, DeBERTa, GPT-2, and GPT-3, with a 10,000-fold reduction in trainable parameters and 3x lower GPU memory usage. It also avoids additional inference latency and supports fast task-switching by replacing matrices in the model. LoRA is compatible with PyTorch and is available as an open-source package. The method leverages the low intrinsic rank of weight updates during adaptation, enabling efficient training and deployment. Empirical studies show that LoRA outperforms other adaptation methods like adapter layers and prefix tuning in terms of performance and efficiency. LoRA is particularly effective for large models like GPT-3, where full fine-tuning is computationally expensive. The method is generalizable to other neural networks with dense layers and has potential for further research in parameter-efficient adaptation.LoRA (Low-Rank Adaptation) is a parameter-efficient method for adapting large language models (LLMs) to specific tasks without retraining all parameters. It freezes the pre-trained weights and introduces low-rank matrices into the Transformer architecture, significantly reducing the number of trainable parameters. LoRA achieves performance comparable to full fine-tuning on tasks like RoBERTa, DeBERTa, GPT-2, and GPT-3, with a 10,000-fold reduction in trainable parameters and 3x lower GPU memory usage. It also avoids additional inference latency and supports fast task-switching by replacing matrices in the model. LoRA is compatible with PyTorch and is available as an open-source package. The method leverages the low intrinsic rank of weight updates during adaptation, enabling efficient training and deployment. Empirical studies show that LoRA outperforms other adaptation methods like adapter layers and prefix tuning in terms of performance and efficiency. LoRA is particularly effective for large models like GPT-3, where full fine-tuning is computationally expensive. The method is generalizable to other neural networks with dense layers and has potential for further research in parameter-efficient adaptation.
Reach us at info@study.space
[slides and audio] LoRA%3A Low-Rank Adaptation of Large Language Models