Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for Large Language Models

2024 | Peijie Dong, Lujun Li, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu
Pruner-Zero is a novel framework for automatically discovering symbolic pruning metrics for Large Language Models (LLMs) using genetic programming. The framework addresses the challenge of efficiently identifying superior pruning metrics that can be used for post-training pruning without requiring retraining or weight updates. Existing pruning methods often rely on human expertise and are computationally expensive. Pruner-Zero introduces an evolutionary search space that includes existing pruning metrics and employs genetic programming to evolve symbolic pruning metrics. The framework also incorporates an opposing operation simplification strategy to enhance search efficiency and reduce redundancy in the search space. The symbolic pruning metric is represented as an expression tree, with terminal nodes representing variables such as weights, gradients, and activations, and internal nodes representing mathematical operations. The framework uses tournament selection, subtree crossover, and node mutation to generate and refine symbolic pruning metrics. The fitness of each metric is evaluated based on perplexity after post-training pruning on the WikiText2 dataset. Pruner-Zero outperforms existing post-training pruning methods such as SparseGPT and Wanda, achieving lower perplexity without weight updates. Extensive experiments on LLaMA and LLaMA-2 demonstrate that Pruner-Zero achieves state-of-the-art performance in both language modeling and zero-shot tasks. The framework is also applied to other LLM families, including OPT and Tiny-LLaMA, showing its generalizability. The results indicate that Pruner-Zero is particularly effective for larger models, such as LLaMA-30B and LLaMA-2-70B. The framework's symbolic pruning metric is also shown to be robust across different sparsity ratios and calibration sample sizes. The effectiveness of the opposing operation simplification strategy is confirmed through ablation studies, which demonstrate its role in reducing redundancy and improving search efficiency. The framework's ability to automatically discover effective pruning metrics without human intervention represents a significant advancement in the field of model compression and optimization.Pruner-Zero is a novel framework for automatically discovering symbolic pruning metrics for Large Language Models (LLMs) using genetic programming. The framework addresses the challenge of efficiently identifying superior pruning metrics that can be used for post-training pruning without requiring retraining or weight updates. Existing pruning methods often rely on human expertise and are computationally expensive. Pruner-Zero introduces an evolutionary search space that includes existing pruning metrics and employs genetic programming to evolve symbolic pruning metrics. The framework also incorporates an opposing operation simplification strategy to enhance search efficiency and reduce redundancy in the search space. The symbolic pruning metric is represented as an expression tree, with terminal nodes representing variables such as weights, gradients, and activations, and internal nodes representing mathematical operations. The framework uses tournament selection, subtree crossover, and node mutation to generate and refine symbolic pruning metrics. The fitness of each metric is evaluated based on perplexity after post-training pruning on the WikiText2 dataset. Pruner-Zero outperforms existing post-training pruning methods such as SparseGPT and Wanda, achieving lower perplexity without weight updates. Extensive experiments on LLaMA and LLaMA-2 demonstrate that Pruner-Zero achieves state-of-the-art performance in both language modeling and zero-shot tasks. The framework is also applied to other LLM families, including OPT and Tiny-LLaMA, showing its generalizability. The results indicate that Pruner-Zero is particularly effective for larger models, such as LLaMA-30B and LLaMA-2-70B. The framework's symbolic pruning metric is also shown to be robust across different sparsity ratios and calibration sample sizes. The effectiveness of the opposing operation simplification strategy is confirmed through ablation studies, which demonstrate its role in reducing redundancy and improving search efficiency. The framework's ability to automatically discover effective pruning metrics without human intervention represents a significant advancement in the field of model compression and optimization.
Reach us at info@study.space
[slides and audio] Pruner-Zero%3A Evolving Symbolic Pruning Metric from scratch for Large Language Models