Understanding ResLoRA%3A Identity Residual Mapping in Low-Rank Adaption

ResLoRA: Identity Residual Mapping in Low-Rank Adaptation **Authors:** Shuhua Shi, Shaohan Huang, Minghui Song, Zhoujun Li, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang **Institution:** School of Computer Science and Engineering, Beihang University, Beijing, China; Microsoft **Abstract:** Low-rank adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method for large language models (LLMs). However, updating LoRA blocks efficiently is challenging due to the long calculation path in the original model. To address this, we propose ResLoRA, an improved framework of LoRA. By adding residual paths during training and using merging approaches to eliminate these paths during inference, ResLoRA achieves better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. Experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of ResLoRA. ResLoRA is the first work to combine residual paths with LoRA. **Contributions:** - We propose ResLoRA, a novel framework that improves LoRA by adding residual paths to accelerate loss reduction during training. - We investigate different merge approaches to convert ResLoRA blocks to LoRA blocks, ensuring they can be merged into the original linear layer without adding extra cost in inference. - We evaluate the effectiveness of ResLoRA on various models and tasks, validating its robustness and performance gains. **Methods:** - **LoRA Blocks:** LoRA uses two matrices parallel to the original linear layer with few trainable parameters during training and merges them during inference. - **ResLoRA Blocks:** Inspired by ResNet, ResLoRA introduces residual paths in LoRA blocks, including input-shortcut, block-shortcut, and middle-shortcut structures. - **Merging Approaches:** Two approaches are proposed to convert ResLoRA blocks to LoRA blocks: merge based on input and merge based on weights of ResLoRA blocks. **Experiments:** - **NLG Tasks:** ResLoRA outperforms LoRA on various NLG tasks, achieving higher accuracy. - **NLU Tasks:** ResLoRA shows significant improvement over LoRA on NLU tasks. - **Text-to-Image Task:** ResLoRA generates more appropriate images in the later steps compared to LoRA. **Conclusion:** ResLoRA effectively accelerates the training process and improves performance in low-rank adaptation without introducing extra trainable parameters or increasing inference cost.ResLoRA: Identity Residual Mapping in Low-Rank Adaptation **Authors:** Shuhua Shi, Shaohan Huang, Minghui Song, Zhoujun Li, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang **Institution:** School of Computer Science and Engineering, Beihang University, Beijing, China; Microsoft **Abstract:** Low-rank adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method for large language models (LLMs). However, updating LoRA blocks efficiently is challenging due to the long calculation path in the original model. To address this, we propose ResLoRA, an improved framework of LoRA. By adding residual paths during training and using merging approaches to eliminate these paths during inference, ResLoRA achieves better results in fewer training steps without any extra trainable parameters or inference cost compared to LoRA. Experiments on NLG, NLU, and text-to-image tasks demonstrate the effectiveness of ResLoRA. ResLoRA is the first work to combine residual paths with LoRA. **Contributions:** - We propose ResLoRA, a novel framework that improves LoRA by adding residual paths to accelerate loss reduction during training. - We investigate different merge approaches to convert ResLoRA blocks to LoRA blocks, ensuring they can be merged into the original linear layer without adding extra cost in inference. - We evaluate the effectiveness of ResLoRA on various models and tasks, validating its robustness and performance gains. **Methods:** - **LoRA Blocks:** LoRA uses two matrices parallel to the original linear layer with few trainable parameters during training and merges them during inference. - **ResLoRA Blocks:** Inspired by ResNet, ResLoRA introduces residual paths in LoRA blocks, including input-shortcut, block-shortcut, and middle-shortcut structures. - **Merging Approaches:** Two approaches are proposed to convert ResLoRA blocks to LoRA blocks: merge based on input and merge based on weights of ResLoRA blocks. **Experiments:** - **NLG Tasks:** ResLoRA outperforms LoRA on various NLG tasks, achieving higher accuracy. - **NLU Tasks:** ResLoRA shows significant improvement over LoRA on NLU tasks. - **Text-to-Image Task:** ResLoRA generates more appropriate images in the later steps compared to LoRA. **Conclusion:** ResLoRA effectively accelerates the training process and improves performance in low-rank adaptation without introducing extra trainable parameters or increasing inference cost.

ResLoRA: Identity Residual Mapping in Low-Rank Adaption

28 Feb 2024 | Shuhua Shi* 1 Shaohan Huang2 Minghui Song2 Zhoujun Li† 1 Zihan Zhang2 Haizhen Huang2 Furu Wei2 Weiwei Deng2 Feng Sun2 Qi Zhang2