MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

20 May 2024 | Ting Jiang, Shaohan Huang, Shengyue Luo, Zihan Zhang, Haizhen Huang, Furu Wei, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang
MoRA is a parameter-efficient fine-tuning method for large language models (LLMs) that improves upon Low-Rank Adaptation (LoRA) by using a square matrix instead of low-rank matrices to achieve high-rank updating while maintaining the same number of trainable parameters. LoRA, which uses low-rank matrices, has been found to limit the ability of LLMs to effectively learn and memorize new knowledge, particularly in memory-intensive tasks like continual pretraining. MoRA introduces non-parameter operators to reduce input dimension and increase output dimension for the square matrix, enabling the weight to be merged back into LLMs, similar to LoRA. MoRA outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks. The method is evaluated across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory, and pretraining. MoRA shows significant improvements over LoRA in memorizing new knowledge, particularly in tasks requiring enhanced knowledge and capabilities. The method also demonstrates superior performance in continual pretraining and memory tasks. MoRA is implemented with non-parameterized operators for decompression and compression, and it achieves better performance than LoRA and ReLoRA in pretraining. The analysis shows that high-rank updating enhances the rank of ΔW, leading to better performance in memory and pretraining tasks. MoRA's effectiveness is demonstrated through experiments on various tasks, showing that it can achieve lower perplexity and better performance compared to existing methods.MoRA is a parameter-efficient fine-tuning method for large language models (LLMs) that improves upon Low-Rank Adaptation (LoRA) by using a square matrix instead of low-rank matrices to achieve high-rank updating while maintaining the same number of trainable parameters. LoRA, which uses low-rank matrices, has been found to limit the ability of LLMs to effectively learn and memorize new knowledge, particularly in memory-intensive tasks like continual pretraining. MoRA introduces non-parameter operators to reduce input dimension and increase output dimension for the square matrix, enabling the weight to be merged back into LLMs, similar to LoRA. MoRA outperforms LoRA on memory-intensive tasks and achieves comparable performance on other tasks. The method is evaluated across five tasks: instruction tuning, mathematical reasoning, continual pretraining, memory, and pretraining. MoRA shows significant improvements over LoRA in memorizing new knowledge, particularly in tasks requiring enhanced knowledge and capabilities. The method also demonstrates superior performance in continual pretraining and memory tasks. MoRA is implemented with non-parameterized operators for decompression and compression, and it achieves better performance than LoRA and ReLoRA in pretraining. The analysis shows that high-rank updating enhances the rank of ΔW, leading to better performance in memory and pretraining tasks. MoRA's effectiveness is demonstrated through experiments on various tasks, showing that it can achieve lower perplexity and better performance compared to existing methods.
Reach us at info@study.space