Understanding RoSA%3A Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

The paper introduces a new parameter-efficient fine-tuning (PEFT) method called Robust Adaptation (RoSA) for large language models (LLMs). RoSA combines low-rank and sparse components to approximate the performance of full-fine-tuning (FFT) solutions, addressing the computational and memory constraints of LLMs. The method is inspired by robust principal component analysis (RPCA), which suggests that complex updates can be better approximated by a combination of low-rank and sparse matrices. RoSA trains two adapters: a standard low-rank adapter and a sparse adapter, which are optimized in parallel with the pre-trained weights. The paper demonstrates that RoSA outperforms existing PEFT methods like LoRA and pure sparse fine-tuning, achieving similar or better accuracy at lower computational and memory costs. The authors also provide an efficient PyTorch implementation of RoSA, including sparse GPU kernels for memory- and computationally-efficient training. The method is shown to be compatible with low-precision base weights, combining quantization, low-rank, and sparse approximations. Experimental results on various datasets and tasks, including grade-school math and SQL query generation, confirm the effectiveness of RoSA, highlighting its potential as a practical tool for resource-constrained settings.The paper introduces a new parameter-efficient fine-tuning (PEFT) method called Robust Adaptation (RoSA) for large language models (LLMs). RoSA combines low-rank and sparse components to approximate the performance of full-fine-tuning (FFT) solutions, addressing the computational and memory constraints of LLMs. The method is inspired by robust principal component analysis (RPCA), which suggests that complex updates can be better approximated by a combination of low-rank and sparse matrices. RoSA trains two adapters: a standard low-rank adapter and a sparse adapter, which are optimized in parallel with the pre-trained weights. The paper demonstrates that RoSA outperforms existing PEFT methods like LoRA and pure sparse fine-tuning, achieving similar or better accuracy at lower computational and memory costs. The authors also provide an efficient PyTorch implementation of RoSA, including sparse GPU kernels for memory- and computationally-efficient training. The method is shown to be compatible with low-precision base weights, combining quantization, low-rank, and sparse approximations. Experimental results on various datasets and tasks, including grade-school math and SQL query generation, confirm the effectiveness of RoSA, highlighting its potential as a practical tool for resource-constrained settings.

RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation

3 Jun 2024 | Mahdi Nikdan * 1 Soroush Tabesh * 1 Elvir Crnčević 1 2 Dan Alistarh 1 3