[slides] LoRA Meets Dropout under a Unified Framework

The paper "LoRA Meets Dropout under a Unified Framework" by Sheng Wang, Liheng Chen, Jiyue Jiang, Boyang Xue, Lingpeng Kong, and Chuan Wu explores the interaction between parameter-efficient fine-tuning (PEFT) methods, specifically LoRA, and dropout techniques in the context of large language models (LLMs). The authors confirm that LoRA is prone to overfitting due to its limited number of trainable parameters, which contrasts with the effectiveness of traditional dropout methods designed for full-finetuning. They introduce a unified framework to compare and analyze various dropout methods, including DropKey, DropAttention, and HiddenCut, based on dropping position, structural pattern, and compensation measure. Through extensive experiments, they demonstrate that DropKey outperforms other methods in LoRA scenarios and propose HiddenKey, a novel dropout method that combines column-wise attention logits and element-wise hidden representations with bidirectional KL divergence loss. HiddenKey shows superior performance across multiple models and tasks, particularly in natural language understanding (NLU) and generation (NLG) tasks, highlighting its effectiveness in mitigating overfitting in parameter-efficient fine-tuning of LLMs. The paper also discusses the limitations of using the KL divergence loss, which increases training time, and provides a comprehensive analysis of the method's performance and statistical significance.The paper "LoRA Meets Dropout under a Unified Framework" by Sheng Wang, Liheng Chen, Jiyue Jiang, Boyang Xue, Lingpeng Kong, and Chuan Wu explores the interaction between parameter-efficient fine-tuning (PEFT) methods, specifically LoRA, and dropout techniques in the context of large language models (LLMs). The authors confirm that LoRA is prone to overfitting due to its limited number of trainable parameters, which contrasts with the effectiveness of traditional dropout methods designed for full-finetuning. They introduce a unified framework to compare and analyze various dropout methods, including DropKey, DropAttention, and HiddenCut, based on dropping position, structural pattern, and compensation measure. Through extensive experiments, they demonstrate that DropKey outperforms other methods in LoRA scenarios and propose HiddenKey, a novel dropout method that combines column-wise attention logits and element-wise hidden representations with bidirectional KL divergence loss. HiddenKey shows superior performance across multiple models and tasks, particularly in natural language understanding (NLU) and generation (NLG) tasks, highlighting its effectiveness in mitigating overfitting in parameter-efficient fine-tuning of LLMs. The paper also discusses the limitations of using the KL divergence loss, which increases training time, and provides a comprehensive analysis of the method's performance and statistical significance.

LoRA Meets Dropout under a Unified Framework

27 May 2024 | Sheng Wang*, Liheng Chen*, Jiyue Jiang♠, Boyang Xue♠, Lingpeng Kong♡, Chuan Wu♡

27 May 2024 | Sheng Wang, Liheng Chen, Jiyue Jiang♠, Boyang Xue♠, Lingpeng Kong♡, Chuan Wu♡