[slides] PACT%3A Parameterized Clipping Activation for Quantized Neural Networks

The paper introduces a novel activation quantization technique called Parameterized Clipping Activation (PACT) for neural networks. PACT uses a clipping parameter $\alpha$ that is optimized during training to find the optimal quantization scale for activations. This technique allows for significant reductions in the bit-widths needed to represent both weights and activations, achieving near full-precision accuracy even at extremely low bit-precisions (down to 2 bits). PACT outperforms existing quantization schemes in terms of accuracy degradation, demonstrating that it can achieve full-precision accuracy with both weights and activations quantized to 4 bits. The paper also shows that exploiting reduced-precision computational units in hardware can lead to super-linear improvements in inference performance due to reduced accelerator compute engine area and efficient data storage in on-chip memories. The effectiveness of PACT is demonstrated through extensive experiments on various popular models and datasets, including CIFAR10, SVHN, AlexNet, ResNet18, and ResNet50.The paper introduces a novel activation quantization technique called Parameterized Clipping Activation (PACT) for neural networks. PACT uses a clipping parameter $\alpha$ that is optimized during training to find the optimal quantization scale for activations. This technique allows for significant reductions in the bit-widths needed to represent both weights and activations, achieving near full-precision accuracy even at extremely low bit-precisions (down to 2 bits). PACT outperforms existing quantization schemes in terms of accuracy degradation, demonstrating that it can achieve full-precision accuracy with both weights and activations quantized to 4 bits. The paper also shows that exploiting reduced-precision computational units in hardware can lead to super-linear improvements in inference performance due to reduced accelerator compute engine area and efficient data storage in on-chip memories. The effectiveness of PACT is demonstrated through extensive experiments on various popular models and datasets, including CIFAR10, SVHN, AlexNet, ResNet18, and ResNet50.

PACT: Parameterized Clipping Activation for Quantized Neural Networks

17 Jul 2018 | Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan