A Survey of Quantization Methods for Efficient Neural Network Inference

A Survey of Quantization Methods for Efficient Neural Network Inference

21 Jun 2021 | Amir Gholami*, Sehoon Kim*, Zhen Dong*, Zhewei Yao*, Michael W. Mahoney, Kurt Keutzer
This article provides a comprehensive survey of quantization methods for efficient neural network inference, highlighting their advantages and disadvantages. Quantization is crucial for reducing the memory footprint and latency of neural networks, especially in resource-constrained environments. The authors discuss various approaches, including uniform and non-uniform quantization, symmetric and asymmetric quantization, and different quantization granularities. They also explore fine-tuning methods such as Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ), as well as zero-shot quantization techniques. The article emphasizes the importance of efficient representation and manipulation of numerical values in neural networks, and how quantization can significantly improve performance in applications like computer vision and natural language processing. The authors aim to provide a useful snapshot of current research in quantization for neural networks, offering insights into the latest advancements and future directions.This article provides a comprehensive survey of quantization methods for efficient neural network inference, highlighting their advantages and disadvantages. Quantization is crucial for reducing the memory footprint and latency of neural networks, especially in resource-constrained environments. The authors discuss various approaches, including uniform and non-uniform quantization, symmetric and asymmetric quantization, and different quantization granularities. They also explore fine-tuning methods such as Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ), as well as zero-shot quantization techniques. The article emphasizes the importance of efficient representation and manipulation of numerical values in neural networks, and how quantization can significantly improve performance in applications like computer vision and natural language processing. The authors aim to provide a useful snapshot of current research in quantization for neural networks, offering insights into the latest advancements and future directions.
Reach us at info@study.space
[slides and audio] A Survey of Quantization Methods for Efficient Neural Network Inference