LaCo: Large Language Model Pruning via Layer Collapse

LaCo: Large Language Model Pruning via Layer Collapse

17 Feb 2024 | Yifei Yang, Zouying Cao, Hai Zhao
LaCo: Large Language Model Pruning via Layer Collapse Yifei Yang, Zouying Cao, and Hai Zhao propose a layer-wise pruning method called Layer Collapse (LaCo) for large language models (LLMs). LaCo collapses rear model layers into a prior layer, enabling a rapid reduction in model size while preserving the model structure. Comprehensive experiments show that LaCo maintains an average task performance of over 80% at pruning ratios of 25-30%, significantly outperforming existing state-of-the-art structured pruning methods. LaCo also preserves the internal structure of model layers, such as maintaining intermediate dimensions unchanged, and can be quickly adapted to existing applications. Post-training experiments confirm that LaCo effectively inherits the parameters of the original model, requiring only a minimal amount of training to restore the pruned model to the loss convergence level of the original model. LaCo is based on the Reserving-Differences-while-Seeking-Common (RDSC) Layer Merge, which involves parameter differencing and merging. The method dynamically merges adjacent layers starting from the topmost layer of the model, ensuring that the output representation of the pruned model on few-shot calibration samples remains as similar as possible to the original model. Algorithm 1 summarizes the procedure of Layer Collapse, which iteratively merges layers and evaluates the similarity of representations. Experiments on popular English LLMs, Llama2-7B and 13B, and bilingual LLMs, Baichuan2-7B and 13B, show that LaCo achieves the best results on most benchmarks. LaCo's performance is significantly higher than the baselines, with an average performance percentage across all datasets far superior to the baselines. LaCo also demonstrates exceptional stability across various benchmarks, maintaining performance above 70% on most benchmarks. Post-training experiments show that LaCo can effectively inherit the parameters of the original model and rapidly recover performance with minimal post-training. Re-pruning experiments show that LaCo can further prune the post-trained model to obtain one with only around 50% of the original parameters while still maintaining relatively good model performance. The motivation for LaCo is based on the observation that the changes in parameters and output representations between adjacent layers within the LLM are not particularly significant. The high similarity in parameters and representations between adjacent layers leads to the consideration of replacing multiple layers with a single layer. LaCo's method is straightforward, relying solely on parameter differences and additions without necessitating modifications to the model's internal structure, resulting in a more concise and efficient pruning solution. LaCo does not require special hardware support and preserves the intrinsic structure of the model. Experimental results demonstrate that LaCo significantly outperforms existing SOTA structured pruning methods.LaCo: Large Language Model Pruning via Layer Collapse Yifei Yang, Zouying Cao, and Hai Zhao propose a layer-wise pruning method called Layer Collapse (LaCo) for large language models (LLMs). LaCo collapses rear model layers into a prior layer, enabling a rapid reduction in model size while preserving the model structure. Comprehensive experiments show that LaCo maintains an average task performance of over 80% at pruning ratios of 25-30%, significantly outperforming existing state-of-the-art structured pruning methods. LaCo also preserves the internal structure of model layers, such as maintaining intermediate dimensions unchanged, and can be quickly adapted to existing applications. Post-training experiments confirm that LaCo effectively inherits the parameters of the original model, requiring only a minimal amount of training to restore the pruned model to the loss convergence level of the original model. LaCo is based on the Reserving-Differences-while-Seeking-Common (RDSC) Layer Merge, which involves parameter differencing and merging. The method dynamically merges adjacent layers starting from the topmost layer of the model, ensuring that the output representation of the pruned model on few-shot calibration samples remains as similar as possible to the original model. Algorithm 1 summarizes the procedure of Layer Collapse, which iteratively merges layers and evaluates the similarity of representations. Experiments on popular English LLMs, Llama2-7B and 13B, and bilingual LLMs, Baichuan2-7B and 13B, show that LaCo achieves the best results on most benchmarks. LaCo's performance is significantly higher than the baselines, with an average performance percentage across all datasets far superior to the baselines. LaCo also demonstrates exceptional stability across various benchmarks, maintaining performance above 70% on most benchmarks. Post-training experiments show that LaCo can effectively inherit the parameters of the original model and rapidly recover performance with minimal post-training. Re-pruning experiments show that LaCo can further prune the post-trained model to obtain one with only around 50% of the original parameters while still maintaining relatively good model performance. The motivation for LaCo is based on the observation that the changes in parameters and output representations between adjacent layers within the LLM are not particularly significant. The high similarity in parameters and representations between adjacent layers leads to the consideration of replacing multiple layers with a single layer. LaCo's method is straightforward, relying solely on parameter differences and additions without necessitating modifications to the model's internal structure, resulting in a more concise and efficient pruning solution. LaCo does not require special hardware support and preserves the intrinsic structure of the model. Experimental results demonstrate that LaCo significantly outperforms existing SOTA structured pruning methods.
Reach us at info@study.space