Speeding up Convolutional Neural Networks with Low Rank Expansions

Speeding up Convolutional Neural Networks with Low Rank Expansions

2014 | Max Jaderberg, Andrea Vedaldi, Andrew Zisserman
This paper presents two methods for accelerating convolutional neural networks (CNNs) by exploiting redundancy in filters and feature channels. The goal is to significantly speed up the evaluation of CNNs while maintaining high accuracy. The methods involve approximating the filter banks using low-rank basis filters that are separable in the spatial domain. These methods are architecture-agnostic and can be applied to existing CPU and GPU convolutional frameworks. The first method, Scheme 1, approximates each filter as a linear combination of a smaller set of separable basis filters. This reduces the number of filters needed for convolution, leading to faster computation. The second method, Scheme 2, decomposes the convolution into two steps: first applying vertical filters and then horizontal filters, which allows for efficient computation using separable filters. Both methods are optimized to minimize reconstruction error, either by directly approximating the original filters or by reconstructing the output of the convolutional layer. The results show that these methods can achieve significant speedups with minimal accuracy loss. For example, a 4.5× speedup was achieved with less than 1% drop in accuracy, while maintaining state-of-the-art performance on standard benchmarks. The methods are tested on a CNN designed for scene text character recognition, demonstrating their effectiveness in real-world applications. The results show that the proposed methods can be combined with other speedup techniques, and they are particularly effective for large-scale applications where computational efficiency is crucial. The paper also discusses the advantages of separable filters over other methods, such as FFT-based approaches, and highlights the potential for further improvements in model performance through data reconstruction optimization.This paper presents two methods for accelerating convolutional neural networks (CNNs) by exploiting redundancy in filters and feature channels. The goal is to significantly speed up the evaluation of CNNs while maintaining high accuracy. The methods involve approximating the filter banks using low-rank basis filters that are separable in the spatial domain. These methods are architecture-agnostic and can be applied to existing CPU and GPU convolutional frameworks. The first method, Scheme 1, approximates each filter as a linear combination of a smaller set of separable basis filters. This reduces the number of filters needed for convolution, leading to faster computation. The second method, Scheme 2, decomposes the convolution into two steps: first applying vertical filters and then horizontal filters, which allows for efficient computation using separable filters. Both methods are optimized to minimize reconstruction error, either by directly approximating the original filters or by reconstructing the output of the convolutional layer. The results show that these methods can achieve significant speedups with minimal accuracy loss. For example, a 4.5× speedup was achieved with less than 1% drop in accuracy, while maintaining state-of-the-art performance on standard benchmarks. The methods are tested on a CNN designed for scene text character recognition, demonstrating their effectiveness in real-world applications. The results show that the proposed methods can be combined with other speedup techniques, and they are particularly effective for large-scale applications where computational efficiency is crucial. The paper also discusses the advantages of separable filters over other methods, such as FFT-based approaches, and highlights the potential for further improvements in model performance through data reconstruction optimization.
Reach us at info@study.space
[slides and audio] Speeding up Convolutional Neural Networks with Low Rank Expansions