19 Jul 2024 | Wei Huang, Xingyu Zheng, Xudong Ma, Haotong Qin, Chengtao Lv, Hong Chen, Jie Luo, Xiaojuan Qi, Xianglong Liu, Michele Magno
The paper "An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs" by Wei Huang et al. evaluates the performance of LLaMA3, a powerful open-source Large Language Model (LLM) and its multimodal counterpart LLaVA-Next-8B, under low-bit quantization. LLaMA3, released by Meta in 2024, has achieved state-of-the-art performance across various tasks through extensive pre-training on over 15 trillion data tokens. The study explores the impact of low-bit quantization on LLaMA3, focusing on post-training quantization (PTQ) and LoRA-finetuning (LoRA-FT) methods. The evaluation covers a wide range of bit-widths from 1 to 8 bits and diverse datasets, including language and visual-language benchmarks. Key findings include significant performance degradation under ultra-low bit widths (2-4 bits), particularly in linguistic and visual contexts. The study highlights the need for advanced quantization techniques to bridge the performance gap and enhance the practicality of LLaMA3 and similar models in resource-limited scenarios. The research provides valuable insights for future developments in LLM and MLLM quantization, aiming to achieve higher accuracy at lower bit widths.The paper "An Empirical Study of LLaMA3 Quantization: From LLMs to MLLMs" by Wei Huang et al. evaluates the performance of LLaMA3, a powerful open-source Large Language Model (LLM) and its multimodal counterpart LLaVA-Next-8B, under low-bit quantization. LLaMA3, released by Meta in 2024, has achieved state-of-the-art performance across various tasks through extensive pre-training on over 15 trillion data tokens. The study explores the impact of low-bit quantization on LLaMA3, focusing on post-training quantization (PTQ) and LoRA-finetuning (LoRA-FT) methods. The evaluation covers a wide range of bit-widths from 1 to 8 bits and diverse datasets, including language and visual-language benchmarks. Key findings include significant performance degradation under ultra-low bit widths (2-4 bits), particularly in linguistic and visual contexts. The study highlights the need for advanced quantization techniques to bridge the performance gap and enhance the practicality of LLaMA3 and similar models in resource-limited scenarios. The research provides valuable insights for future developments in LLM and MLLM quantization, aiming to achieve higher accuracy at lower bit widths.