This paper introduces a unified framework called Quantized Convolutional Neural Networks (Q-CNN) to accelerate and compress convolutional neural networks (CNNs) for mobile devices. The authors propose quantizing both filter kernels in convolutional layers and weighting matrices in fully-connected layers to minimize the estimation error of each layer's response, achieving 4-6× speed-up and 15-20× compression with minimal loss in classification accuracy. The Q-CNN framework is evaluated on the ILSVRC-12 benchmark, demonstrating significant improvements in test-phase efficiency and memory consumption. The authors also implement the quantized CNN model on mobile devices, enabling fast image classification within one second. The main contributions include a unified Q-CNN framework, an effective training scheme to suppress cumulative errors, and the demonstration of substantial acceleration and compression with minimal performance degradation.This paper introduces a unified framework called Quantized Convolutional Neural Networks (Q-CNN) to accelerate and compress convolutional neural networks (CNNs) for mobile devices. The authors propose quantizing both filter kernels in convolutional layers and weighting matrices in fully-connected layers to minimize the estimation error of each layer's response, achieving 4-6× speed-up and 15-20× compression with minimal loss in classification accuracy. The Q-CNN framework is evaluated on the ILSVRC-12 benchmark, demonstrating significant improvements in test-phase efficiency and memory consumption. The authors also implement the quantized CNN model on mobile devices, enabling fast image classification within one second. The main contributions include a unified Q-CNN framework, an effective training scheme to suppress cumulative errors, and the demonstration of substantial acceleration and compression with minimal performance degradation.