9 Jun 2014 | Remi Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun and Rob Fergus
This paper presents techniques to speed up the test-time evaluation of large convolutional neural networks (CNNs) designed for object recognition tasks. The models, while delivering impressive accuracy, require millions of floating-point operations per image, making them challenging to deploy on resource-constrained devices like smartphones and Internet-scale clusters. The authors exploit the redundancy within convolutional filters to derive approximations that significantly reduce computational requirements. Using state-of-the-art models, they demonstrate speedups of up to 2-3 times on both CPU and GPU for convolutional layers, while maintaining accuracy within 1% of the original model. The main contributions include generic methods to exploit the redundancy in deep CNNs and empirical speedups on Imagenet CNNs, with parameter reductions in fully connected layers by factors of 5-10. The paper discusses various approximation metrics and low-rank tensor approximations, including matrix decomposition and higher-order tensor approximations. It also introduces specific approximations for the first and higher convolutional layers, such as monochromatic and biclustering approximations, and presents fine-tuning techniques to restore performance. Experimental results show significant speedups and memory savings, with minimal performance degradation.This paper presents techniques to speed up the test-time evaluation of large convolutional neural networks (CNNs) designed for object recognition tasks. The models, while delivering impressive accuracy, require millions of floating-point operations per image, making them challenging to deploy on resource-constrained devices like smartphones and Internet-scale clusters. The authors exploit the redundancy within convolutional filters to derive approximations that significantly reduce computational requirements. Using state-of-the-art models, they demonstrate speedups of up to 2-3 times on both CPU and GPU for convolutional layers, while maintaining accuracy within 1% of the original model. The main contributions include generic methods to exploit the redundancy in deep CNNs and empirical speedups on Imagenet CNNs, with parameter reductions in fully connected layers by factors of 5-10. The paper discusses various approximation metrics and low-rank tensor approximations, including matrix decomposition and higher-order tensor approximations. It also introduces specific approximations for the first and higher convolutional layers, such as monochromatic and biclustering approximations, and presents fine-tuning techniques to restore performance. Experimental results show significant speedups and memory savings, with minimal performance degradation.