Understanding A Survey of Model Compression and Acceleration for Deep Neural Networks

This paper provides a comprehensive review of recent techniques for compressing and accelerating deep neural networks (DNNs). The techniques are categorized into four main groups: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Each category is described in detail, including their methods, performance, advantages, and drawbacks. The paper also discusses recent successful methods such as dynamic capacity networks and stochastic depth networks, and reviews evaluation matrices, datasets, and benchmark efforts. Finally, it concludes by discussing remaining challenges and potential future directions, emphasizing the importance of combining these techniques to maximize gains in various applications.This paper provides a comprehensive review of recent techniques for compressing and accelerating deep neural networks (DNNs). The techniques are categorized into four main groups: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Each category is described in detail, including their methods, performance, advantages, and drawbacks. The paper also discusses recent successful methods such as dynamic capacity networks and stochastic depth networks, and reviews evaluation matrices, datasets, and benchmark efforts. Finally, it concludes by discussing remaining challenges and potential future directions, emphasizing the importance of combining these techniques to maximize gains in various applications.

A Survey of Model Compression and Acceleration for Deep Neural Networks

14 Jun 2020 | Yu Cheng, Duo Wang, Pan Zhou, Member, IEEE, and Tao Zhang, Senior Member, IEEE