27 Feb 2024 | Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi, Haitz Sáez de Ocáriz Borde, Rickard Brüel Gabrielsson, Leshem Choshen, Marzyeh Ghassemi, Mikhail Yurochkin, Justin Solomon
This paper investigates the asymmetry in the roles of low-rank adapter matrices (A and B) in LoRA (Low-Rank Adaptation) fine-tuning of foundation models. The study reveals that the B matrix plays a more critical role in fine-tuning than the A matrix, and that fine-tuning B alone can achieve performance comparable to fine-tuning both A and B. Theoretical and empirical analyses show that the B matrix is responsible for projecting features towards the desired output, while the A matrix extracts features from the input. This asymmetry is supported by experiments on various models and tasks, including RoBERTa, BART-Large, LLaMA-2, and Vision Transformers (ViTs). The findings suggest that freezing A and fine-tuning B can improve generalization and reduce parameter usage, leading to more efficient and effective fine-tuning strategies. The paper also provides insights into the theoretical underpinnings of this asymmetry, demonstrating that the performance of LoRA can be enhanced by leveraging the structure of the low-rank adaptation. The results highlight the importance of understanding the roles of different components in parameter-efficient fine-tuning and provide practical guidance for improving the efficiency and generalization of LoRA-based models.This paper investigates the asymmetry in the roles of low-rank adapter matrices (A and B) in LoRA (Low-Rank Adaptation) fine-tuning of foundation models. The study reveals that the B matrix plays a more critical role in fine-tuning than the A matrix, and that fine-tuning B alone can achieve performance comparable to fine-tuning both A and B. Theoretical and empirical analyses show that the B matrix is responsible for projecting features towards the desired output, while the A matrix extracts features from the input. This asymmetry is supported by experiments on various models and tasks, including RoBERTa, BART-Large, LLaMA-2, and Vision Transformers (ViTs). The findings suggest that freezing A and fine-tuning B can improve generalization and reduce parameter usage, leading to more efficient and effective fine-tuning strategies. The paper also provides insights into the theoretical underpinnings of this asymmetry, demonstrating that the performance of LoRA can be enhanced by leveraging the structure of the low-rank adaptation. The results highlight the importance of understanding the roles of different components in parameter-efficient fine-tuning and provide practical guidance for improving the efficiency and generalization of LoRA-based models.