Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

9 Jun 2014 | Remi Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun and Rob Fergus
This paper presents techniques to accelerate the evaluation of large convolutional networks (CNNs) for object recognition tasks. These models, while highly accurate, are computationally intensive, requiring millions of floating-point operations per image. The authors propose methods to reduce computational load by exploiting redundancy in convolutional filters, achieving speedups of 2-3× on both CPU and GPU while maintaining accuracy within 1% of the original model. The key idea is to compress CNNs by finding low-rank approximations of their weights. This involves using singular value decomposition (SVD) and tensor decompositions to reduce the number of operations and parameters. The authors also introduce two types of approximations: monochromatic and biclustering. Monochromatic approximation reduces the color dimension of filters, while biclustering groups input and output features into clusters and approximates the corresponding weight tensors. The paper describes a two-step process: first, compressing the weights of each convolutional layer using low-rank approximations, and then fine-tuning the upper layers to restore prediction performance. The methods are tested on state-of-the-art ImageNet CNNs, achieving significant reductions in computation and memory usage. For example, the number of operations is reduced by 2-3×, and the number of parameters in fully connected layers is reduced by 5-10×. The techniques are shown to be effective on both CPU and GPU platforms. The authors also demonstrate that these methods can be combined to achieve even greater speedups. Additionally, the memory footprint of the network is significantly reduced, making it more suitable for deployment on mobile devices and embedded systems. The results show that the proposed methods provide substantial computational and memory savings without significant loss in performance. These techniques are orthogonal to other optimization methods such as quantization and Fourier domain processing, and can be used in combination to further improve efficiency. The paper concludes that these techniques enable efficient evaluation of large CNNs, making them more practical for real-world applications.This paper presents techniques to accelerate the evaluation of large convolutional networks (CNNs) for object recognition tasks. These models, while highly accurate, are computationally intensive, requiring millions of floating-point operations per image. The authors propose methods to reduce computational load by exploiting redundancy in convolutional filters, achieving speedups of 2-3× on both CPU and GPU while maintaining accuracy within 1% of the original model. The key idea is to compress CNNs by finding low-rank approximations of their weights. This involves using singular value decomposition (SVD) and tensor decompositions to reduce the number of operations and parameters. The authors also introduce two types of approximations: monochromatic and biclustering. Monochromatic approximation reduces the color dimension of filters, while biclustering groups input and output features into clusters and approximates the corresponding weight tensors. The paper describes a two-step process: first, compressing the weights of each convolutional layer using low-rank approximations, and then fine-tuning the upper layers to restore prediction performance. The methods are tested on state-of-the-art ImageNet CNNs, achieving significant reductions in computation and memory usage. For example, the number of operations is reduced by 2-3×, and the number of parameters in fully connected layers is reduced by 5-10×. The techniques are shown to be effective on both CPU and GPU platforms. The authors also demonstrate that these methods can be combined to achieve even greater speedups. Additionally, the memory footprint of the network is significantly reduced, making it more suitable for deployment on mobile devices and embedded systems. The results show that the proposed methods provide substantial computational and memory savings without significant loss in performance. These techniques are orthogonal to other optimization methods such as quantization and Fourier domain processing, and can be used in combination to further improve efficiency. The paper concludes that these techniques enable efficient evaluation of large CNNs, making them more practical for real-world applications.
Reach us at info@study.space
[slides and audio] Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation