15 Apr 2024 | Yang Lin, Xinyu Ma, Xu Chu, Yujie Jin, Zhibang Yang, Yasha Wang, Hong Mei
This paper addresses the issue of overfitting in parameter-efficient fine-tuning (PEFT) methods, particularly those based on Low-Rank Adaptation (LoRA). The authors propose a LoRA Dropout mechanism, which introduces random noises to the learnable low-rank matrices, increasing parameter sparsity. The theoretical framework of this mechanism is presented from the perspective of sparsity regularization, providing a generalization error bound. The results show that appropriate sparsity helps tighten the gap between empirical and generalization risks, thereby controlling overfitting. Additionally, a test-time ensemble strategy is introduced, which further compresses the error bound and improves performance during inference. Extensive experiments on various natural language processing (NLP) tasks validate the effectiveness of the proposed LoRA Dropout framework in improving model accuracy and calibration. The paper also discusses the limitations and future work, including the need for parallel computing to reduce time overhead.This paper addresses the issue of overfitting in parameter-efficient fine-tuning (PEFT) methods, particularly those based on Low-Rank Adaptation (LoRA). The authors propose a LoRA Dropout mechanism, which introduces random noises to the learnable low-rank matrices, increasing parameter sparsity. The theoretical framework of this mechanism is presented from the perspective of sparsity regularization, providing a generalization error bound. The results show that appropriate sparsity helps tighten the gap between empirical and generalization risks, thereby controlling overfitting. Additionally, a test-time ensemble strategy is introduced, which further compresses the error bound and improves performance during inference. Extensive experiments on various natural language processing (NLP) tasks validate the effectiveness of the proposed LoRA Dropout framework in improving model accuracy and calibration. The paper also discusses the limitations and future work, including the need for parallel computing to reduce time overhead.