[slides and audio] Trained Ternary Quantization

The paper introduces Trained Ternary Quantization (TTQ), a method to reduce the precision of weights in deep neural networks to ternary values, significantly reducing model size while maintaining or improving accuracy. TTQ uses two full-precision scaling coefficients for each layer to quantize weights to \{-W_l^n, 0, +W_l^n\}, where \(W_l^p\) and \(W_l^n\) are trainable parameters. During training, gradients are backpropagated to both the ternary weights and the scaling coefficients, allowing the model to learn both the ternary values and assignments. This approach achieves higher accuracy on datasets like CIFAR-10 and ImageNet, with improvements ranging from 0.04% to 0.36% for ResNet models and a 1.6% improvement over full-precision AlexNet on ImageNet. The method also reduces model size by about 16x and can be further optimized for custom hardware, potentially achieving even higher efficiency.The paper introduces Trained Ternary Quantization (TTQ), a method to reduce the precision of weights in deep neural networks to ternary values, significantly reducing model size while maintaining or improving accuracy. TTQ uses two full-precision scaling coefficients for each layer to quantize weights to \{-W_l^n, 0, +W_l^n\}, where \(W_l^p\) and \(W_l^n\) are trainable parameters. During training, gradients are backpropagated to both the ternary weights and the scaling coefficients, allowing the model to learn both the ternary values and assignments. This approach achieves higher accuracy on datasets like CIFAR-10 and ImageNet, with improvements ranging from 0.04% to 0.36% for ResNet models and a 1.6% improvement over full-precision AlexNet on ImageNet. The method also reduces model size by about 16x and can be further optimized for custom hardware, potentially achieving even higher efficiency.

Trained Ternary Quantization

23 Feb 2017 | Chenzhuo Zhu*, Song Han, Huizi Mao, William J. Dally