DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

2 Feb 2018 | Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou
DoReFa-Net is a method for training convolutional neural networks (CNNs) with low bitwidth weights, activations, and parameter gradients. The key innovation is the stochastic quantization of gradients to low bitwidth numbers during the backward pass, allowing the use of bit convolution kernels to accelerate both training and inference. This approach leverages the efficiency of bit convolutions on various hardware platforms, such as CPU, FPGA, ASIC, and GPU, to reduce the computational complexity and energy consumption of low bitwidth neural network training. Experiments on the SVHN and ImageNet datasets demonstrate that DoReFa-Net can achieve comparable prediction accuracy to 32-bit models while significantly reducing the bitwidth of weights, activations, and gradients. For example, a DoReFa-Net derived from AlexNet with 1-bit weights, 2-bit activations, and 6-bit gradients achieves 46.1% top-1 accuracy on the ImageNet validation set. The method also explores the configuration space of bitwidths for weights, activations, and gradients, finding that gradients generally require larger bitwidth than activations, which in turn require larger bitwidth than weights. The paper includes a detailed algorithm for DoReFa-Net and discusses the impact of quantizing the first and last layers, as well as techniques to reduce run-time memory footprint.DoReFa-Net is a method for training convolutional neural networks (CNNs) with low bitwidth weights, activations, and parameter gradients. The key innovation is the stochastic quantization of gradients to low bitwidth numbers during the backward pass, allowing the use of bit convolution kernels to accelerate both training and inference. This approach leverages the efficiency of bit convolutions on various hardware platforms, such as CPU, FPGA, ASIC, and GPU, to reduce the computational complexity and energy consumption of low bitwidth neural network training. Experiments on the SVHN and ImageNet datasets demonstrate that DoReFa-Net can achieve comparable prediction accuracy to 32-bit models while significantly reducing the bitwidth of weights, activations, and gradients. For example, a DoReFa-Net derived from AlexNet with 1-bit weights, 2-bit activations, and 6-bit gradients achieves 46.1% top-1 accuracy on the ImageNet validation set. The method also explores the configuration space of bitwidths for weights, activations, and gradients, finding that gradients generally require larger bitwidth than activations, which in turn require larger bitwidth than weights. The paper includes a detailed algorithm for DoReFa-Net and discusses the impact of quantizing the first and last layers, as well as techniques to reduce run-time memory footprint.
Reach us at info@study.space