Quantized Convolutional Neural Networks for Mobile Devices

Quantized Convolutional Neural Networks for Mobile Devices

16 May 2016 | Jiaxiang Wu, Cong Leng, Yuhang Wang, Qinghao Hu, Jian Cheng
This paper proposes a unified framework called Quantized CNN (Q-CNN) to accelerate and compress convolutional neural networks (CNNs) while maintaining high classification accuracy. The Q-CNN framework quantizes both convolutional and fully-connected layers to reduce computation and memory overhead. By minimizing the estimation error of each layer's response, the framework achieves significant speed-up and compression with minimal accuracy loss. Extensive experiments on the ILSVRC-12 benchmark demonstrate that Q-CNN achieves 4-6 times speed-up and 15-20 times compression with less than 1% accuracy loss. The framework is also implemented on mobile devices, enabling fast image classification within one second. The key contributions include a unified Q-CNN framework that accelerates and compresses CNNs, an effective training scheme to suppress accumulated error during quantization, and the demonstration of Q-CNN's effectiveness on both convolutional and fully-connected layers. The framework is evaluated on two benchmarks, MNIST and ILSVRC-12, showing high compression rates and minimal accuracy loss. The results show that Q-CNN outperforms existing methods in terms of speed-up and compression while maintaining high classification accuracy. The framework is also effective on mobile devices, demonstrating its practicality for real-world applications.This paper proposes a unified framework called Quantized CNN (Q-CNN) to accelerate and compress convolutional neural networks (CNNs) while maintaining high classification accuracy. The Q-CNN framework quantizes both convolutional and fully-connected layers to reduce computation and memory overhead. By minimizing the estimation error of each layer's response, the framework achieves significant speed-up and compression with minimal accuracy loss. Extensive experiments on the ILSVRC-12 benchmark demonstrate that Q-CNN achieves 4-6 times speed-up and 15-20 times compression with less than 1% accuracy loss. The framework is also implemented on mobile devices, enabling fast image classification within one second. The key contributions include a unified Q-CNN framework that accelerates and compresses CNNs, an effective training scheme to suppress accumulated error during quantization, and the demonstration of Q-CNN's effectiveness on both convolutional and fully-connected layers. The framework is evaluated on two benchmarks, MNIST and ILSVRC-12, showing high compression rates and minimal accuracy loss. The results show that Q-CNN outperforms existing methods in terms of speed-up and compression while maintaining high classification accuracy. The framework is also effective on mobile devices, demonstrating its practicality for real-world applications.
Reach us at info@study.space
[slides] Quantized Convolutional Neural Networks for Mobile Devices | StudySpace