[slides and audio] PiSSA%3A Principal Singular Values and Singular Vectors Adaptation of Large Language Models

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models **Authors:** Fanxu Meng, Zhaohui Wang, Muhun Zhang **Institution:** Institute for Artificial Intelligence, Peking University; School of Intelligence Science and Technology, Peking University; National Key Laboratory of General Artificial Intelligence, BIGAI **GitHub:** https://github.com/GraphPKU/PiSSA **Abstract:** To efficiently fine-tune large language models (LLMs), the Low-Rank Adaptation (LoRA) method approximates model changes using low-rank matrices. However, LoRA's initializers with Gaussian noise and zeros can lead to slow convergence. PiSSA addresses this by initializing the adapter matrices with the principal components of the original matrix, while freezing the residual components. This approach allows for faster convergence and enhanced performance. Comparative experiments across 12 models and 5 NLG and 8 NLU tasks show that PiSSA consistently outperforms LoRA. On the GSM8K benchmark, PiSSA achieves 72.86% accuracy with Mistral-7B, surpassing LoRA's 67.7%. PiSSA is also compatible with quantization, reducing quantization errors compared to QLoRA. PiSSA can be initialized in seconds using Fast SVD, making it a practical alternative to LoRA. **Introduction:** Fine-tuning LLMs is crucial for improving their performance on specific tasks, but it is resource-intensive. Parameter-efficient fine-tuning (PEFT) methods, like LoRA, aim to reduce memory usage and computational costs. LoRA approximates weight updates using low-rank matrices, but its initializers can lead to slow convergence. PiSSA, inspired by singular value decomposition (SVD), initializes adapters with the principal singular values and vectors of the original matrix, while freezing the residual components. This approach allows for faster convergence and better performance. PiSSA is also compatible with quantization, reducing quantization errors. **Related Work:** PEFT methods include partial fine-tuning, soft prompt fine-tuning, non-linear adapter fine-tuning, and low-rank adapter fine-tuning. LoRA and its variants have been widely adopted due to their ability to maintain the model's original architecture while enabling efficient fine-tuning. QLoRA integrates LoRA with 4-bit quantization, but PiSSA focuses on tuning the essential parts of the model, offering a different approach. **PiSSA: Principal Singular Values and Singular Vectors Adaptation:** PiSSA applies SVD to the weight matrix, partitioning it into principal and residual components. The principal components are used to initialize the adapter, while the residual components are frozen. This approach allows for faster convergence and better performance. PiSSA shares the same architecture as LoRA but offers significant improvements. **Quantization:** PiSSA canPiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models **Authors:** Fanxu Meng, Zhaohui Wang, Muhun Zhang **Institution:** Institute for Artificial Intelligence, Peking University; School of Intelligence Science and Technology, Peking University; National Key Laboratory of General Artificial Intelligence, BIGAI **GitHub:** https://github.com/GraphPKU/PiSSA **Abstract:** To efficiently fine-tune large language models (LLMs), the Low-Rank Adaptation (LoRA) method approximates model changes using low-rank matrices. However, LoRA's initializers with Gaussian noise and zeros can lead to slow convergence. PiSSA addresses this by initializing the adapter matrices with the principal components of the original matrix, while freezing the residual components. This approach allows for faster convergence and enhanced performance. Comparative experiments across 12 models and 5 NLG and 8 NLU tasks show that PiSSA consistently outperforms LoRA. On the GSM8K benchmark, PiSSA achieves 72.86% accuracy with Mistral-7B, surpassing LoRA's 67.7%. PiSSA is also compatible with quantization, reducing quantization errors compared to QLoRA. PiSSA can be initialized in seconds using Fast SVD, making it a practical alternative to LoRA. **Introduction:** Fine-tuning LLMs is crucial for improving their performance on specific tasks, but it is resource-intensive. Parameter-efficient fine-tuning (PEFT) methods, like LoRA, aim to reduce memory usage and computational costs. LoRA approximates weight updates using low-rank matrices, but its initializers can lead to slow convergence. PiSSA, inspired by singular value decomposition (SVD), initializes adapters with the principal singular values and vectors of the original matrix, while freezing the residual components. This approach allows for faster convergence and better performance. PiSSA is also compatible with quantization, reducing quantization errors. **Related Work:** PEFT methods include partial fine-tuning, soft prompt fine-tuning, non-linear adapter fine-tuning, and low-rank adapter fine-tuning. LoRA and its variants have been widely adopted due to their ability to maintain the model's original architecture while enabling efficient fine-tuning. QLoRA integrates LoRA with 4-bit quantization, but PiSSA focuses on tuning the essential parts of the model, offering a different approach. **PiSSA: Principal Singular Values and Singular Vectors Adaptation:** PiSSA applies SVD to the weight matrix, partitioning it into principal and residual components. The principal components are used to initialize the adapter, while the residual components are frozen. This approach allows for faster convergence and better performance. PiSSA shares the same architecture as LoRA but offers significant improvements. **Quantization:** PiSSA can

PISSA: PRINCIPAL SINGULAR VALUES AND SINGULAR VECTORS ADAPTATION OF LARGE LANGUAGE MODELS

28 May 2024 | Fanxu Meng, Zhaohui Wang, Muhan Zhang