Understanding Rethinking the Value of Network Pruning

This paper reconsiders the value of network pruning, a technique used to reduce the computational cost of deep models in resource-constrained settings. The authors challenge two common beliefs: that training a large, over-parameterized model is necessary and that fine-tuning a pruned model with inherited weights is superior to training it from scratch. Through extensive empirical evaluations on multiple datasets and network architectures, they find that for structured pruning methods with predefined target network architectures, training a small target model from random initialization can achieve comparable or better performance compared to fine-tuning a pruned model. For pruning methods that automatically discover the target architecture, training from scratch also yields similar or better results. These findings suggest that the value of structured pruning lies more in identifying efficient architectures rather than inheriting weights from a large model. The authors also discuss the implications for architecture search and compare their results with the "Lottery Ticket Hypothesis," concluding that random initialization is sufficient for pruning methods to achieve competitive performance. Overall, the paper advocates for more careful evaluations of structured pruning methods and highlights the potential of pruning as an architecture search paradigm.This paper reconsiders the value of network pruning, a technique used to reduce the computational cost of deep models in resource-constrained settings. The authors challenge two common beliefs: that training a large, over-parameterized model is necessary and that fine-tuning a pruned model with inherited weights is superior to training it from scratch. Through extensive empirical evaluations on multiple datasets and network architectures, they find that for structured pruning methods with predefined target network architectures, training a small target model from random initialization can achieve comparable or better performance compared to fine-tuning a pruned model. For pruning methods that automatically discover the target architecture, training from scratch also yields similar or better results. These findings suggest that the value of structured pruning lies more in identifying efficient architectures rather than inheriting weights from a large model. The authors also discuss the implications for architecture search and compare their results with the "Lottery Ticket Hypothesis," concluding that random initialization is sufficient for pruning methods to achieve competitive performance. Overall, the paper advocates for more careful evaluations of structured pruning methods and highlights the potential of pruning as an architecture search paradigm.

Rethinking the Value of Network Pruning

5 Mar 2019 | Zhuang Liu1*, Mingjie Sun2**, Tinghui Zhou1, Gao Huang2, Trevor Darrell1