LoRA-GA: Low-Rank Adaptation with Gradient Approximation
Fine-tuning large-scale pretrained models is computationally and memory-intensive. LoRA, a popular Parameter-Efficient Fine-Tuning (PEFT) method, reduces these costs by using a low-rank auxiliary model. However, LoRA converges slower than full fine-tuning, leading to higher computational costs and worse performance. This paper introduces LoRA-GA, a novel initialization method that aligns the gradients of low-rank matrix products with those of full fine-tuning, significantly improving convergence speed and performance. LoRA-GA achieves convergence rates comparable to full fine-tuning while maintaining or improving performance on benchmark datasets like GLUE and MT-Bench. It also reduces memory usage and training time compared to vanilla LoRA. The method ensures rank and scale stability through careful initialization, making it efficient and effective for parameter-efficient fine-tuning.LoRA-GA: Low-Rank Adaptation with Gradient Approximation
Fine-tuning large-scale pretrained models is computationally and memory-intensive. LoRA, a popular Parameter-Efficient Fine-Tuning (PEFT) method, reduces these costs by using a low-rank auxiliary model. However, LoRA converges slower than full fine-tuning, leading to higher computational costs and worse performance. This paper introduces LoRA-GA, a novel initialization method that aligns the gradients of low-rank matrix products with those of full fine-tuning, significantly improving convergence speed and performance. LoRA-GA achieves convergence rates comparable to full fine-tuning while maintaining or improving performance on benchmark datasets like GLUE and MT-Bench. It also reduces memory usage and training time compared to vanilla LoRA. The method ensures rank and scale stability through careful initialization, making it efficient and effective for parameter-efficient fine-tuning.