[slides and audio] Quantifying the Capabilities of LLMs across Scale and Precision

This study investigates the impact of model scale and quantization on the performance of Large Language Models (LLMs). The authors evaluate two major families of open-source instruct models, Llama 2 and Mistral, with parameter sizes ranging from 7 billion to 70 billion. They conduct comprehensive zero-shot experiments across various tasks, including natural language understanding, reasoning, misinformation detection, and hallucination. The results show that larger models generally outperform smaller ones, suggesting that scale remains a significant factor in enhancing performance. However, the benefits of scale are not consistent across all tasks, particularly in reasoning tasks. The study also finds that larger models are more resilient to precision reduction, maintaining high accuracy even at 4-bit quantization. This makes larger models a more efficient choice within a fixed memory budget compared to smaller models at higher precision. The findings highlight the importance of considering both model scale and quantization when deploying LLMs in resource-constrained environments.This study investigates the impact of model scale and quantization on the performance of Large Language Models (LLMs). The authors evaluate two major families of open-source instruct models, Llama 2 and Mistral, with parameter sizes ranging from 7 billion to 70 billion. They conduct comprehensive zero-shot experiments across various tasks, including natural language understanding, reasoning, misinformation detection, and hallucination. The results show that larger models generally outperform smaller ones, suggesting that scale remains a significant factor in enhancing performance. However, the benefits of scale are not consistent across all tasks, particularly in reasoning tasks. The study also finds that larger models are more resilient to precision reduction, maintaining high accuracy even at 4-bit quantization. This makes larger models a more efficient choice within a fixed memory budget compared to smaller models at higher precision. The findings highlight the importance of considering both model scale and quantization when deploying LLMs in resource-constrained environments.

Quantifying the Capabilities of LLMs across Scale and Precision

8 May 2024 | Sher Badshah, Hassan Sajjad