Trained Ternary Quantization

Trained Ternary Quantization

23 Feb 2017 | Chenzhuo Zhu*, Song Han, Huizi Mao, William J. Dally
Trained Ternary Quantization (TTQ) is a method to reduce the precision of weights in neural networks to ternary values, achieving minimal accuracy degradation and even improving accuracy for some models on CIFAR-10 and ImageNet. TTQ uses two full-precision scaling coefficients per layer to quantize weights to {-W_l^n, 0, +W_l^p}, allowing for a 16x smaller model size compared to full-precision models. The ternary models can be viewed as sparse binary weight networks, potentially accelerating with custom circuits. Experiments show that TTQ outperforms full-precision models on CIFAR-10 and ImageNet, with a 0.3% improvement on ImageNet and 3% improvement over previous ternary models. TTQ also outperforms binary-weight models and achieves higher accuracy than the state-of-the-art ternary network (TWN) on ResNet-20. The method reduces model size by 16x and maintains the advantage of fewer multiplications, as W_l^p and W_l^n are fixed during inference. TTQ is trained from scratch, making it as easy as training a full-precision model. The method learns both ternary values and ternary assignments, enabling better performance and efficiency. TTQ achieves higher accuracy on CIFAR-10 and ImageNet, with the ternary model outperforming full-precision models by 0.04%, 0.16%, and 0.36% for ResNet-32, ResNet-44, and ResNet-56, respectively. On ImageNet, the model outperforms full-precision AlexNet by 0.3% in Top-1 accuracy and previous ternary models by 3%. TTQ also improves spatial and energy efficiency, with ternary weights enabling sparsity and reducing energy consumption. The method is effective in reducing model size and improving performance, making it suitable for deployment on mobile devices with limited power budgets.Trained Ternary Quantization (TTQ) is a method to reduce the precision of weights in neural networks to ternary values, achieving minimal accuracy degradation and even improving accuracy for some models on CIFAR-10 and ImageNet. TTQ uses two full-precision scaling coefficients per layer to quantize weights to {-W_l^n, 0, +W_l^p}, allowing for a 16x smaller model size compared to full-precision models. The ternary models can be viewed as sparse binary weight networks, potentially accelerating with custom circuits. Experiments show that TTQ outperforms full-precision models on CIFAR-10 and ImageNet, with a 0.3% improvement on ImageNet and 3% improvement over previous ternary models. TTQ also outperforms binary-weight models and achieves higher accuracy than the state-of-the-art ternary network (TWN) on ResNet-20. The method reduces model size by 16x and maintains the advantage of fewer multiplications, as W_l^p and W_l^n are fixed during inference. TTQ is trained from scratch, making it as easy as training a full-precision model. The method learns both ternary values and ternary assignments, enabling better performance and efficiency. TTQ achieves higher accuracy on CIFAR-10 and ImageNet, with the ternary model outperforming full-precision models by 0.04%, 0.16%, and 0.36% for ResNet-32, ResNet-44, and ResNet-56, respectively. On ImageNet, the model outperforms full-precision AlexNet by 0.3% in Top-1 accuracy and previous ternary models by 3%. TTQ also improves spatial and energy efficiency, with ternary weights enabling sparsity and reducing energy consumption. The method is effective in reducing model size and improving performance, making it suitable for deployment on mobile devices with limited power budgets.
Reach us at info@study.space
Understanding Trained Ternary Quantization