27 Feb 2024 | Jiacheng Zhu, Kristjan Grenewald, Kimia Nadjahi, Haitz Sáez de Ocázir Borde, Rickard Briël Gabrielson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon
This paper investigates the asymmetry in the roles of low-rank adapter matrices in Low-Rank Adaptation (LoRA) for fine-tuning large, pre-trained foundation models. LoRA updates a subset of parameters by representing weight matrices as a product of two matrices, \( \Delta W = BA \), where \( A \) and \( B \) have fewer rows and columns than \( \Delta W \). The study finds that \( B \) is crucial for performance, while \( A \) is less important. Specifically, training \( B \) alone is more effective than training both \( A \) and \( B \), and a randomly initialized \( A \) can perform similarly to a fine-tuned one. The paper also shows that fixing \( A \) to a random orthogonal matrix can improve generalization and reduce parameter savings. Experiments on various models and datasets, including RoBERTa, BART-Large, LLaMA-2, and ViTs, support these findings. The results highlight the importance of focusing on \( B \) for efficient and effective fine-tuning.This paper investigates the asymmetry in the roles of low-rank adapter matrices in Low-Rank Adaptation (LoRA) for fine-tuning large, pre-trained foundation models. LoRA updates a subset of parameters by representing weight matrices as a product of two matrices, \( \Delta W = BA \), where \( A \) and \( B \) have fewer rows and columns than \( \Delta W \). The study finds that \( B \) is crucial for performance, while \( A \) is less important. Specifically, training \( B \) alone is more effective than training both \( A \) and \( B \), and a randomly initialized \( A \) can perform similarly to a fine-tuned one. The paper also shows that fixing \( A \) to a random orthogonal matrix can improve generalization and reduce parameter savings. Experiments on various models and datasets, including RoBERTa, BART-Large, LLaMA-2, and ViTs, support these findings. The results highlight the importance of focusing on \( B \) for efficient and effective fine-tuning.