[slides and audio] SVD-LLM%3A Truncation-aware Singular Value Decomposition for Large Language Model Compression

The paper "SVD-LLM: Truncation-Aware Singular Value Decomposition for Large Language Model Compression" introduces a novel post-training compression method for Large Language Models (LLMs) using Singular Value Decomposition (SVD). The method, called SVD-LLM, addresses two key limitations of existing SVD-based compression methods: the misalignment between singular value truncation and compression loss, and the lack of parameter updates after SVD truncation. SVD-LLM incorporates a truncation-aware data whitening technique to ensure a direct mapping between singular values and compression loss, minimizing the compression loss when smaller singular values are truncated. It also employs a sequential low-rank approximation strategy to update the remaining parameters of the compressed model, compensating for accuracy degradation caused by singular value truncation. The paper evaluates SVD-LLM on 10 datasets and seven models from three different LLM families at three different scales. The results demonstrate that SVD-LLM outperforms state-of-the-art SVD-based and pruning-based LLM compression methods, achieving significant improvements in performance and efficiency, especially at high compression ratios. SVD-LLM also shows superior inference speedup and memory reduction on both GPU and CPU hardware.The paper "SVD-LLM: Truncation-Aware Singular Value Decomposition for Large Language Model Compression" introduces a novel post-training compression method for Large Language Models (LLMs) using Singular Value Decomposition (SVD). The method, called SVD-LLM, addresses two key limitations of existing SVD-based compression methods: the misalignment between singular value truncation and compression loss, and the lack of parameter updates after SVD truncation. SVD-LLM incorporates a truncation-aware data whitening technique to ensure a direct mapping between singular values and compression loss, minimizing the compression loss when smaller singular values are truncated. It also employs a sequential low-rank approximation strategy to update the remaining parameters of the compressed model, compensating for accuracy degradation caused by singular value truncation. The paper evaluates SVD-LLM on 10 datasets and seven models from three different LLM families at three different scales. The results demonstrate that SVD-LLM outperforms state-of-the-art SVD-based and pruning-based LLM compression methods, achieving significant improvements in performance and efficiency, especially at high compression ratios. SVD-LLM also shows superior inference speedup and memory reduction on both GPU and CPU hardware.

SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression

16 Mar 2025 | Xin Wang, Yu Zheng, Zhongwei Wan, Mi Zhang