16 Feb 2024 | Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu
BitDistiller is a framework that combines Quantization-Aware Training (QAT) with Knowledge Distillation (KD) to enhance the performance of sub-4-bit Large Language Models (LLMs). The framework employs asymmetric quantization and clipping techniques to preserve the fidelity of quantized weights, and introduces a novel Confidence-Aware Kullback-Leibler Divergence (CAKLD) objective to improve the effectiveness of knowledge transfer during self-distillation. Empirical evaluations show that BitDistiller outperforms existing methods in both 3-bit and 2-bit configurations on general language understanding and complex reasoning benchmarks. It is more cost-effective, requiring fewer data and training resources. The framework achieves superior performance in both 3-bit and 2-bit settings, demonstrating significant improvements in accuracy and efficiency. BitDistiller is particularly effective in low-bit quantization, maintaining high accuracy while reducing computational demands. The method is designed to be efficient and effective, with a focus on enhancing model performance through self-distillation and asymmetric quantization techniques. The framework is evaluated on a variety of tasks, including language modeling, reasoning, and code generation, showing its versatility and effectiveness in deploying LLMs on resource-constrained devices.BitDistiller is a framework that combines Quantization-Aware Training (QAT) with Knowledge Distillation (KD) to enhance the performance of sub-4-bit Large Language Models (LLMs). The framework employs asymmetric quantization and clipping techniques to preserve the fidelity of quantized weights, and introduces a novel Confidence-Aware Kullback-Leibler Divergence (CAKLD) objective to improve the effectiveness of knowledge transfer during self-distillation. Empirical evaluations show that BitDistiller outperforms existing methods in both 3-bit and 2-bit configurations on general language understanding and complex reasoning benchmarks. It is more cost-effective, requiring fewer data and training resources. The framework achieves superior performance in both 3-bit and 2-bit settings, demonstrating significant improvements in accuracy and efficiency. BitDistiller is particularly effective in low-bit quantization, maintaining high accuracy while reducing computational demands. The method is designed to be efficient and effective, with a focus on enhancing model performance through self-distillation and asymmetric quantization techniques. The framework is evaluated on a variety of tasks, including language modeling, reasoning, and code generation, showing its versatility and effectiveness in deploying LLMs on resource-constrained devices.