SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

2025 | Xin Wang, Yu Zheng, Zhongwei Wan, Mi Zhang
SVD-LLM is a post-training singular value decomposition (SVD) based method for compressing large language models (LLMs). It addresses two key limitations of existing SVD-based compression methods: (1) truncating smaller singular values may lead to higher compression loss, and (2) the lack of parameter update after SVD truncation. SVD-LLM incorporates a truncation-aware data whitening technique to ensure a direct mapping between singular values and compression loss. It also uses a parameter update with sequential low-rank approximation to compensate for accuracy degradation after SVD compression. The method is evaluated on 10 datasets and seven models from three different LLM families at three different scales. Results show that SVD-LLM outperforms state-of-the-art SVD-based and other compression methods, especially at high compression ratios. SVD-LLM achieves inference speedup and memory reduction when deployed on real hardware, including both GPU and CPU. It also reduces runtime KV cache memory without additional accuracy drop. The method is compared with other types of LLM compression methods, including pruning and quantization, and demonstrates superior performance. SVD-LLM is also combined with 2-bit post-training quantization to achieve state-of-the-art compression performance without incurring expensive retraining. The results show that SVD-LLM is effective in compressing LLMs while maintaining high accuracy and efficiency.SVD-LLM is a post-training singular value decomposition (SVD) based method for compressing large language models (LLMs). It addresses two key limitations of existing SVD-based compression methods: (1) truncating smaller singular values may lead to higher compression loss, and (2) the lack of parameter update after SVD truncation. SVD-LLM incorporates a truncation-aware data whitening technique to ensure a direct mapping between singular values and compression loss. It also uses a parameter update with sequential low-rank approximation to compensate for accuracy degradation after SVD compression. The method is evaluated on 10 datasets and seven models from three different LLM families at three different scales. Results show that SVD-LLM outperforms state-of-the-art SVD-based and other compression methods, especially at high compression ratios. SVD-LLM achieves inference speedup and memory reduction when deployed on real hardware, including both GPU and CPU. It also reduces runtime KV cache memory without additional accuracy drop. The method is compared with other types of LLM compression methods, including pruning and quantization, and demonstrates superior performance. SVD-LLM is also combined with 2-bit post-training quantization to achieve state-of-the-art compression performance without incurring expensive retraining. The results show that SVD-LLM is effective in compressing LLMs while maintaining high accuracy and efficiency.
Reach us at info@study.space
[slides] SVD-LLM%3A Truncation-aware Singular Value Decomposition for Large Language Model Compression | StudySpace