Quantifying the Capabilities of LLMs across Scale and Precision

Quantifying the Capabilities of LLMs across Scale and Precision

8 May 2024 | Sher Badshah, Hassan Sajjad
This paper investigates the impact of model scale and quantization on the performance of large language models (LLMs). The study evaluates two major open-source instruct models, Llama 2 and Mistral, with parameter sizes ranging from 7 billion to 70 billion. The research focuses on how model scale and quantization affect performance across various tasks, including natural language understanding, reasoning, hallucination, and misinformation detection. The results show that larger models generally outperform smaller ones, suggesting that scale remains an important factor in enhancing performance. However, the benefits of scale are not consistent across all tasks, particularly in reasoning tasks. Larger models show exceptional resilience to precision reduction and can maintain high accuracy even at 4-bit quantization for numerous tasks. The study also finds that quantization can be an effective way to reduce computational requirements without significantly compromising performance. The findings suggest that deploying a larger model with 4-bit quantization is often more beneficial than using a smaller model at higher precision within a fixed memory budget. The study also highlights the importance of considering the trade-offs between performance and efficiency when deploying LLMs in resource-constrained settings. The research provides insights into the effectiveness of different quantization techniques and their impact on model performance across various tasks. The study contributes to the understanding of how model scale and quantization influence the capabilities of LLMs and provides guidance for optimizing their deployment in real-world applications.This paper investigates the impact of model scale and quantization on the performance of large language models (LLMs). The study evaluates two major open-source instruct models, Llama 2 and Mistral, with parameter sizes ranging from 7 billion to 70 billion. The research focuses on how model scale and quantization affect performance across various tasks, including natural language understanding, reasoning, hallucination, and misinformation detection. The results show that larger models generally outperform smaller ones, suggesting that scale remains an important factor in enhancing performance. However, the benefits of scale are not consistent across all tasks, particularly in reasoning tasks. Larger models show exceptional resilience to precision reduction and can maintain high accuracy even at 4-bit quantization for numerous tasks. The study also finds that quantization can be an effective way to reduce computational requirements without significantly compromising performance. The findings suggest that deploying a larger model with 4-bit quantization is often more beneficial than using a smaller model at higher precision within a fixed memory budget. The study also highlights the importance of considering the trade-offs between performance and efficiency when deploying LLMs in resource-constrained settings. The research provides insights into the effectiveness of different quantization techniques and their impact on model performance across various tasks. The study contributes to the understanding of how model scale and quantization influence the capabilities of LLMs and provides guidance for optimizing their deployment in real-world applications.
Reach us at info@study.space