26 Mar 2024 | Akshat Ramachandran, Zishen Wan, Geonhwa Jeong, John Gustafson, Tushar Krishna
This paper introduces Logarithmic Posits (LP), a novel data type that combines the adaptability of posits with the hardware efficiency of logarithmic number systems (LNS) for efficient deep neural network (DNN) inference. LP dynamically adjusts to DNN parameter distributions by parameterizing bit fields, enabling adaptive precision. The authors propose a genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters while minimizing representational divergence between quantized and full-precision models through a global-local contrastive objective. Additionally, they design a unified mixed-precision LP accelerator (LPA) architecture comprising of processing elements (PEs) incorporating LP in the computational datapath.
The algorithm-hardware co-design demonstrates on average less than 1% drop in top-1 accuracy across various CNN and ViT models. It also achieves approximately 2× improvements in performance per unit area and 2.2× gains in energy efficiency compared to state-of-the-art quantization accelerators using different data types. The LPQ framework is evaluated on various DNN models, including ResNet18, ResNet50, MobileNetV2, ViT-B, DeiT-S, and Swin-T, showing consistent performance improvements. The LPA accelerator is implemented in Verilog RTL and synthesized using Synopsys Design Compiler with a TSMC 28 nm process. The results show that LPA achieves nearly a 2× improvement in performance per unit area compared to ANT and BitFusion for the same architecture configuration. The study also compares the impact of different PE types on performance, accuracy, and energy efficiency, showing that LPA achieves a balanced trade-off between accuracy and efficiency. The paper concludes that the proposed algorithm-hardware co-design significantly improves performance and energy efficiency compared to state-of-the-art quantization accelerators and frameworks.This paper introduces Logarithmic Posits (LP), a novel data type that combines the adaptability of posits with the hardware efficiency of logarithmic number systems (LNS) for efficient deep neural network (DNN) inference. LP dynamically adjusts to DNN parameter distributions by parameterizing bit fields, enabling adaptive precision. The authors propose a genetic-algorithm based framework, LP Quantization (LPQ), to find optimal layer-wise LP parameters while minimizing representational divergence between quantized and full-precision models through a global-local contrastive objective. Additionally, they design a unified mixed-precision LP accelerator (LPA) architecture comprising of processing elements (PEs) incorporating LP in the computational datapath.
The algorithm-hardware co-design demonstrates on average less than 1% drop in top-1 accuracy across various CNN and ViT models. It also achieves approximately 2× improvements in performance per unit area and 2.2× gains in energy efficiency compared to state-of-the-art quantization accelerators using different data types. The LPQ framework is evaluated on various DNN models, including ResNet18, ResNet50, MobileNetV2, ViT-B, DeiT-S, and Swin-T, showing consistent performance improvements. The LPA accelerator is implemented in Verilog RTL and synthesized using Synopsys Design Compiler with a TSMC 28 nm process. The results show that LPA achieves nearly a 2× improvement in performance per unit area compared to ANT and BitFusion for the same architecture configuration. The study also compares the impact of different PE types on performance, accuracy, and energy efficiency, showing that LPA achieves a balanced trade-off between accuracy and efficiency. The paper concludes that the proposed algorithm-hardware co-design significantly improves performance and energy efficiency compared to state-of-the-art quantization accelerators and frameworks.