A Survey of Model Compression and Acceleration for Deep Neural Networks

A Survey of Model Compression and Acceleration for Deep Neural Networks

14 Jun 2020 | Yu Cheng, Duo Wang, Pan Zhou, Member, IEEE, and Tao Zhang, Senior Member, IEEE
This paper provides a comprehensive review of recent techniques for compressing and accelerating deep neural networks (DNNs). The techniques are categorized into four main groups: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Each category is described in detail, including their methods, performance, advantages, and drawbacks. The paper also discusses recent successful methods such as dynamic capacity networks and stochastic depth networks, and reviews evaluation matrices, datasets, and benchmark efforts. Finally, it concludes by discussing remaining challenges and potential future directions, emphasizing the importance of combining these techniques to maximize gains in various applications.This paper provides a comprehensive review of recent techniques for compressing and accelerating deep neural networks (DNNs). The techniques are categorized into four main groups: parameter pruning and quantization, low-rank factorization, transferred/compact convolutional filters, and knowledge distillation. Each category is described in detail, including their methods, performance, advantages, and drawbacks. The paper also discusses recent successful methods such as dynamic capacity networks and stochastic depth networks, and reviews evaluation matrices, datasets, and benchmark efforts. Finally, it concludes by discussing remaining challenges and potential future directions, emphasizing the importance of combining these techniques to maximize gains in various applications.
Reach us at info@study.space
Understanding A Survey of Model Compression and Acceleration for Deep Neural Networks