[slides and audio] Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

This paper introduces a novel approach to enhance the training of Low-Rank Adaptation (LoRA) for fine-tuning large foundation models. LoRA is a parameter-efficient fine-tuning method that freezes the weights of a pretrained model and updates an additive low-rank matrix. The proposed method introduces an $r \times r$ preconditioner in each gradient step, where $r$ is the LoRA rank. The theoretical analysis shows that this preconditioner stabilizes feature learning under the infinite-width neural network (NN) setting. Empirically, the implementation of the preconditioner requires minimal changes to existing optimizer code and introduces negligible storage and runtime overhead. Experimental results on both large language models and text-to-image diffusion models demonstrate that the preconditioner significantly improves the convergence and reliability of SGD and AdamW optimizers. The training process also becomes more robust to hyperparameter choices, such as learning rate. The preconditioner is derived from a novel Riemannian metric in the low-rank matrix field, and the code is available at <https://github.com/pilancilab/Riemannian_Preconditioned_LoRA>.This paper introduces a novel approach to enhance the training of Low-Rank Adaptation (LoRA) for fine-tuning large foundation models. LoRA is a parameter-efficient fine-tuning method that freezes the weights of a pretrained model and updates an additive low-rank matrix. The proposed method introduces an $r \times r$ preconditioner in each gradient step, where $r$ is the LoRA rank. The theoretical analysis shows that this preconditioner stabilizes feature learning under the infinite-width neural network (NN) setting. Empirically, the implementation of the preconditioner requires minimal changes to existing optimizer code and introduces negligible storage and runtime overhead. Experimental results on both large language models and text-to-image diffusion models demonstrate that the preconditioner significantly improves the convergence and reliability of SGD and AdamW optimizers. The training process also becomes more robust to hyperparameter choices, such as learning rate. The preconditioner is derived from a novel Riemannian metric in the low-rank matrix field, and the code is available at <https://github.com/pilancilab/Riemannian_Preconditioned_LoRA>.

Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models

2024 | Fangzhao Zhang, Mert Pilanci