2024 | Mahdi Nikdan, Soroush Tabesh, Elvir Crnčević, Dan Alistarh
RoSA is a parameter-efficient fine-tuning method for large language models (LLMs) that outperforms existing methods like LoRA and sparse fine-tuning. Inspired by robust principal component analysis, RoSA jointly trains low-rank and sparse components on top of fixed pretrained weights to efficiently approximate full fine-tuning (FFT) performance. It achieves higher accuracy at similar parameter and computational budgets, and can even match FFT performance on some tasks. RoSA is implemented with efficient GPU support and is compatible with low-precision base weights, resulting in a joint representation combining quantization, low-rank, and sparse approximations. The method is demonstrated across various generative tasks, including grade-school math and SQL query generation, where it outperforms other methods. RoSA also supports quantization of base weights via QLoRA, further improving efficiency without significant accuracy loss. The results show that RoSA can achieve high accuracy with significantly reduced memory and computational costs compared to FFT, making it a practical solution for resource-constrained settings. The method is implemented in PyTorch and is available at https://github.com/IST-DASLab/RoSA.RoSA is a parameter-efficient fine-tuning method for large language models (LLMs) that outperforms existing methods like LoRA and sparse fine-tuning. Inspired by robust principal component analysis, RoSA jointly trains low-rank and sparse components on top of fixed pretrained weights to efficiently approximate full fine-tuning (FFT) performance. It achieves higher accuracy at similar parameter and computational budgets, and can even match FFT performance on some tasks. RoSA is implemented with efficient GPU support and is compatible with low-precision base weights, resulting in a joint representation combining quantization, low-rank, and sparse approximations. The method is demonstrated across various generative tasks, including grade-school math and SQL query generation, where it outperforms other methods. RoSA also supports quantization of base weights via QLoRA, further improving efficiency without significant accuracy loss. The results show that RoSA can achieve high accuracy with significantly reduced memory and computational costs compared to FFT, making it a practical solution for resource-constrained settings. The method is implemented in PyTorch and is available at https://github.com/IST-DASLab/RoSA.