[slides and audio] MoRA%3A High-Rank Updating for Parameter-Efficient Fine-Tuning

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning **Authors:** Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang **Institution:** Beihang University, Microsoft Corporation **Abstract:** Low-rank adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method for large language models (LLMs). This paper analyzes the impact of low-rank updating in LoRA and finds that it may limit LLMs' ability to learn and memorize new knowledge. Inspired by this, the authors propose MoRA, which uses a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. MoRA introduces non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix, ensuring the weight can be merged back into LLMs. Comprehensive evaluations across five tasks—instruction tuning, mathematical reasoning, continual pretraining, memory, and pretraining—show that MoRA outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks. **Contributions:** 1. Introduce MoRA, a novel method that uses a square matrix for high-rank updating. 2. Develop four types of non-parameter operators to reduce input dimension and increase output dimension. 3. Evaluate MoRA on five tasks and demonstrate its effectiveness. **Related Work:** - LoRA: A popular PEFT method for LLMs, using low-rank matrices to approximate full-rank updates. - Fine-Tuning with LLMs: Categorizes fine-tuning tasks into instruction tuning, complex reasoning, and continual pretraining. **Analysis of Low-rank Updating:** - LoRA's low-rank updates struggle with tasks requiring enhanced knowledge and capabilities. - MoRA aims to alleviate this limitation by maximizing the rank in ΔW while maintaining the same number of trainable parameters. **Method:** - MoRA uses a square matrix $M$ to achieve high-rank updating. - Non-parameter operators are used to reduce input dimension and increase output dimension. - MoRA can be merged back into LLMs, similar to LoRA. **Experiments:** - MoRA outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks. - Pretraining experiments further demonstrate the effectiveness of high-rank updating. **Conclusion:** MoRA addresses the limitations of LoRA by using high-rank updating, showing superior performance in memory-intensive tasks and comparable performance in other tasks.MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning **Authors:** Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang **Institution:** Beihang University, Microsoft Corporation **Abstract:** Low-rank adaptation (LoRA) is a popular parameter-efficient fine-tuning (PEFT) method for large language models (LLMs). This paper analyzes the impact of low-rank updating in LoRA and finds that it may limit LLMs' ability to learn and memorize new knowledge. Inspired by this, the authors propose MoRA, which uses a square matrix to achieve high-rank updating while maintaining the same number of trainable parameters. MoRA introduces non-parameter operators to reduce the input dimension and increase the output dimension for the square matrix, ensuring the weight can be merged back into LLMs. Comprehensive evaluations across five tasks—instruction tuning, mathematical reasoning, continual pretraining, memory, and pretraining—show that MoRA outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks. **Contributions:** 1. Introduce MoRA, a novel method that uses a square matrix for high-rank updating. 2. Develop four types of non-parameter operators to reduce input dimension and increase output dimension. 3. Evaluate MoRA on five tasks and demonstrate its effectiveness. **Related Work:** - LoRA: A popular PEFT method for LLMs, using low-rank matrices to approximate full-rank updates. - Fine-Tuning with LLMs: Categorizes fine-tuning tasks into instruction tuning, complex reasoning, and continual pretraining. **Analysis of Low-rank Updating:** - LoRA's low-rank updates struggle with tasks requiring enhanced knowledge and capabilities. - MoRA aims to alleviate this limitation by maximizing the rank in ΔW while maintaining the same number of trainable parameters. **Method:** - MoRA uses a square matrix $M$ to achieve high-rank updating. - Non-parameter operators are used to reduce input dimension and increase output dimension. - MoRA can be merged back into LLMs, similar to LoRA. **Experiments:** - MoRA outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks. - Pretraining experiments further demonstrate the effectiveness of high-rank updating. **Conclusion:** MoRA addresses the limitations of LoRA by using high-rank updating, showing superior performance in memory-intensive tasks and comparable performance in other tasks.

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

20 May 2024 | Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang