DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

2 Feb 2018 | Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, Yuheng Zou
DoReFa-Net is a method for training convolutional neural networks (CNNs) with low bitwidth weights and activations using low bitwidth parameter gradients. During the backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. This allows DoReFa-Net to use bit convolution kernels to accelerate both training and inference. The method enables efficient implementation of bit convolutions on CPU, FPGA, ASIC, and GPU, making it possible to accelerate training of low bitwidth neural networks on these hardware platforms. Experiments on SVHN and ImageNet datasets show that DoReFa-Net can achieve comparable prediction accuracy to 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet with 1-bit weights and 2-bit activations can be trained from scratch using 6-bit gradients to achieve 46.1% top-1 accuracy on the ImageNet validation set. The DoReFa-Net AlexNet model is publicly released. The method generalizes binarized neural networks to allow for arbitrary bitwidths in weights, activations, and gradients. It uses straight-through estimators to handle the quantization of gradients, and explores the configuration space of bitwidths for weights, activations, and gradients. The method also addresses the issue of quantizing gradients to low bitwidths while maintaining prediction accuracy. The paper presents a detailed algorithm for training DoReFa-Net, and evaluates its performance on the SVHN and ImageNet datasets. The results show that increasing the bitwidth of activations while keeping weights at 1-bit leads to significant accuracy improvements. The paper also discusses the impact of quantizing the first and last layers of the network, and shows that quantizing these layers can lead to significant accuracy degradation. The paper concludes that DoReFa-Net is a promising method for training low bitwidth CNNs, and that further research is needed to explore its potential in different applications.DoReFa-Net is a method for training convolutional neural networks (CNNs) with low bitwidth weights and activations using low bitwidth parameter gradients. During the backward pass, parameter gradients are stochastically quantized to low bitwidth numbers before being propagated to convolutional layers. This allows DoReFa-Net to use bit convolution kernels to accelerate both training and inference. The method enables efficient implementation of bit convolutions on CPU, FPGA, ASIC, and GPU, making it possible to accelerate training of low bitwidth neural networks on these hardware platforms. Experiments on SVHN and ImageNet datasets show that DoReFa-Net can achieve comparable prediction accuracy to 32-bit counterparts. For example, a DoReFa-Net derived from AlexNet with 1-bit weights and 2-bit activations can be trained from scratch using 6-bit gradients to achieve 46.1% top-1 accuracy on the ImageNet validation set. The DoReFa-Net AlexNet model is publicly released. The method generalizes binarized neural networks to allow for arbitrary bitwidths in weights, activations, and gradients. It uses straight-through estimators to handle the quantization of gradients, and explores the configuration space of bitwidths for weights, activations, and gradients. The method also addresses the issue of quantizing gradients to low bitwidths while maintaining prediction accuracy. The paper presents a detailed algorithm for training DoReFa-Net, and evaluates its performance on the SVHN and ImageNet datasets. The results show that increasing the bitwidth of activations while keeping weights at 1-bit leads to significant accuracy improvements. The paper also discusses the impact of quantizing the first and last layers of the network, and shows that quantizing these layers can lead to significant accuracy degradation. The paper concludes that DoReFa-Net is a promising method for training low bitwidth CNNs, and that further research is needed to explore its potential in different applications.
Reach us at info@study.space
[slides and audio] DoReFa-Net%3A Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients