[slides and audio] Incremental Network Quantization%3A Towards Lossless CNNs with Low-Precision Weights

This paper introduces Incremental Network Quantization (INQ), a novel method to efficiently convert pre-trained full-precision convolutional neural network (CNN) models into low-precision versions with weights constrained to be either powers of two or zero. Unlike existing methods that often suffer from significant accuracy loss, INQ addresses this issue through three interdependent operations: weight partition, group-wise quantization, and re-training. The method divides the weights into two disjoint groups, with the first group forming a low-precision base and the second group being re-trained to compensate for accuracy loss. This iterative process continues until all weights are quantized, ensuring lossless quantization and improved accuracy. Extensive experiments on the ImageNet dataset using various deep CNN architectures (AlexNet, VGG-16, GoogleNet, and ResNets) demonstrate that INQ achieves improved or comparable accuracy with 5-bit, 4-bit, and even 3-bit quantization compared to full-precision models. The method also shows easy convergence in training and significant improvements over deep compression methods. The code for INQ is available at https://github.com/Zhouaojun/Incremental-Network-Quantization.This paper introduces Incremental Network Quantization (INQ), a novel method to efficiently convert pre-trained full-precision convolutional neural network (CNN) models into low-precision versions with weights constrained to be either powers of two or zero. Unlike existing methods that often suffer from significant accuracy loss, INQ addresses this issue through three interdependent operations: weight partition, group-wise quantization, and re-training. The method divides the weights into two disjoint groups, with the first group forming a low-precision base and the second group being re-trained to compensate for accuracy loss. This iterative process continues until all weights are quantized, ensuring lossless quantization and improved accuracy. Extensive experiments on the ImageNet dataset using various deep CNN architectures (AlexNet, VGG-16, GoogleNet, and ResNets) demonstrate that INQ achieves improved or comparable accuracy with 5-bit, 4-bit, and even 3-bit quantization compared to full-precision models. The method also shows easy convergence in training and significant improvements over deep compression methods. The code for INQ is available at https://github.com/Zhouaojun/Incremental-Network-Quantization.

INCREMENTAL NETWORK QUANTIZATION: TOWARDS LOSSLESS CNNs WITH LOW-PRECISION WEIGHTS

25 Aug 2017 | Aojun Zhou*, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen