LoRA Training in the NTK Regime has No Spurious Local Minima

LoRA Training in the NTK Regime has No Spurious Local Minima

28 May 2024 | Uijeong Jang, Jason D. Lee, Ernest K. Ryu
This paper theoretically analyzes the effectiveness of Low-rank Adaptation (LoRA) in the neural tangent kernel (NTK) regime for fine-tuning large language models (LLMs). The main findings are: 1. **Existence of Low-Rank Solutions**: Full fine-tuning (without LoRA) admits a low-rank solution with rank \( r \lesssim \sqrt{N} \), where \( N \) is the number of training data points. 2. ** Elimination of Spurious Local Minima**: Using LoRA with a rank \( r \gtrsim \sqrt{N} \) eliminates spurious local minima, allowing stochastic gradient descent to find the low-rank solutions. 3. **Generalization**: The low-rank solution found using LoRA generalizes well. The paper provides theoretical guarantees on the trainability and generalization capabilities of LoRA, addressing the limitations of previous theoretical analyses. It also includes experimental results validating the theoretical findings.This paper theoretically analyzes the effectiveness of Low-rank Adaptation (LoRA) in the neural tangent kernel (NTK) regime for fine-tuning large language models (LLMs). The main findings are: 1. **Existence of Low-Rank Solutions**: Full fine-tuning (without LoRA) admits a low-rank solution with rank \( r \lesssim \sqrt{N} \), where \( N \) is the number of training data points. 2. ** Elimination of Spurious Local Minima**: Using LoRA with a rank \( r \gtrsim \sqrt{N} \) eliminates spurious local minima, allowing stochastic gradient descent to find the low-rank solutions. 3. **Generalization**: The low-rank solution found using LoRA generalizes well. The paper provides theoretical guarantees on the trainability and generalization capabilities of LoRA, addressing the limitations of previous theoretical analyses. It also includes experimental results validating the theoretical findings.
Reach us at info@study.space