LoRA Dropout as a Sparsity Regularizer for Overfitting Control

LoRA Dropout as a Sparsity Regularizer for Overfitting Control

15 Apr 2024 | Yang Lin, Xinyu Ma, Xu Chu, Yujie Jin, Zhibang Yang, Yasha Wang, Hong Mei
This paper proposes a LoRA Dropout framework to control overfitting in LoRA-based Parameter-Efficient Fine-Tuning (PEFT) methods. LoRA is a popular method for parameter-efficient fine-tuning of large pre-trained language models (PLMs), where the delta weight matrix is decomposed into low-rank matrices. However, selecting an appropriate rank in the decomposition remains a challenge, as a high rank increases model complexity and risks overfitting, while a low rank may reduce expressive power. To address this, the authors introduce LoRA Dropout, which applies dropout to the low-rank matrices during training. This introduces randomness and sparsity, helping to regularize the model and reduce overfitting. Theoretical analysis shows that sparsity introduced by LoRA Dropout tightens the gap between empirical and generalization risks, thereby controlling overfitting. Additionally, a test-time ensemble strategy is proposed, where multiple dropout instances are used during inference to further compress the error bound and improve performance. The framework is validated through extensive experiments on various NLP tasks, including natural language understanding, question answering, instruction tuning, and confidence calibration. Results show that LoRA Dropout consistently outperforms baselines in terms of model accuracy and calibration. The method is also effective in reducing overfitting, as demonstrated by improved performance on tasks with distributional disparities between training and evaluation sets. Theoretical results show that the LoRA Dropout framework can be viewed as an optimization problem with sparsity regularization, and that the test-time ensemble strategy further tightens the error bound. The framework is theoretically grounded and has practical benefits in improving model generalization and calibration. The authors also conduct ablation studies and sensitivity analysis, showing that the effectiveness of the method is robust across different parameter settings. Overall, the proposed LoRA Dropout framework provides a theoretically sound and practically effective solution for controlling overfitting in LoRA-based PEFT methods.This paper proposes a LoRA Dropout framework to control overfitting in LoRA-based Parameter-Efficient Fine-Tuning (PEFT) methods. LoRA is a popular method for parameter-efficient fine-tuning of large pre-trained language models (PLMs), where the delta weight matrix is decomposed into low-rank matrices. However, selecting an appropriate rank in the decomposition remains a challenge, as a high rank increases model complexity and risks overfitting, while a low rank may reduce expressive power. To address this, the authors introduce LoRA Dropout, which applies dropout to the low-rank matrices during training. This introduces randomness and sparsity, helping to regularize the model and reduce overfitting. Theoretical analysis shows that sparsity introduced by LoRA Dropout tightens the gap between empirical and generalization risks, thereby controlling overfitting. Additionally, a test-time ensemble strategy is proposed, where multiple dropout instances are used during inference to further compress the error bound and improve performance. The framework is validated through extensive experiments on various NLP tasks, including natural language understanding, question answering, instruction tuning, and confidence calibration. Results show that LoRA Dropout consistently outperforms baselines in terms of model accuracy and calibration. The method is also effective in reducing overfitting, as demonstrated by improved performance on tasks with distributional disparities between training and evaluation sets. Theoretical results show that the LoRA Dropout framework can be viewed as an optimization problem with sparsity regularization, and that the test-time ensemble strategy further tightens the error bound. The framework is theoretically grounded and has practical benefits in improving model generalization and calibration. The authors also conduct ablation studies and sensitivity analysis, showing that the effectiveness of the method is robust across different parameter settings. Overall, the proposed LoRA Dropout framework provides a theoretically sound and practically effective solution for controlling overfitting in LoRA-based PEFT methods.
Reach us at info@study.space
Understanding LoRA Dropout as a Sparsity Regularizer for Overfitting Control