Understanding Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

The paper proposes a two-step method to speed up convolution layers in large convolutional neural networks (CNNs) using tensor decomposition and fine-tuning. The first step involves computing a low-rank CP-decomposition of the 4D convolution kernel tensor using non-linear least squares, which approximates the kernel as a sum of rank-one tensors. In the second step, this decomposition is used to replace the original convolutional layer with a sequence of four convolutional layers with smaller kernels. The entire network is then fine-tuned on training data using standard backpropagation. The method is evaluated on two CNNs: a 36-class character classification CNN and AlexNet. For the character classification CNN, the approach achieves an 8.5x CPU speedup with only a 1% accuracy drop. For AlexNet, it speeds up the second convolution layer by a factor of 4x at the cost of a 1% increase in the overall top-5 classification error. The paper highlights the advantages of the CP-decomposition method, including ease of implementation, CNN implementation, and fine-tuning, as well as its efficiency in achieving a better speed-accuracy trade-off compared to previous methods. The results confirm the intuition that modern CNNs are over-parameterized, and the reduction in parameters leads to more compact networks with reduced memory footprint.The paper proposes a two-step method to speed up convolution layers in large convolutional neural networks (CNNs) using tensor decomposition and fine-tuning. The first step involves computing a low-rank CP-decomposition of the 4D convolution kernel tensor using non-linear least squares, which approximates the kernel as a sum of rank-one tensors. In the second step, this decomposition is used to replace the original convolutional layer with a sequence of four convolutional layers with smaller kernels. The entire network is then fine-tuned on training data using standard backpropagation. The method is evaluated on two CNNs: a 36-class character classification CNN and AlexNet. For the character classification CNN, the approach achieves an 8.5x CPU speedup with only a 1% accuracy drop. For AlexNet, it speeds up the second convolution layer by a factor of 4x at the cost of a 1% increase in the overall top-5 classification error. The paper highlights the advantages of the CP-decomposition method, including ease of implementation, CNN implementation, and fine-tuning, as well as its efficiency in achieving a better speed-accuracy trade-off compared to previous methods. The results confirm the intuition that modern CNNs are over-parameterized, and the reduction in parameters leads to more compact networks with reduced memory footprint.

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

24 Apr 2015 | Vadim Lebedev, Yaroslav Ganin, Maksim Rakhuba, Ivan Oseledets, Victor Lempitsky