2024 | Yuzhen Huang, Jinghan Zhang, Zifei Shan, Junxian He
This paper investigates the relationship between compression efficiency and intelligence in large language models (LLMs). The authors propose that intelligence can be measured by the ability of LLMs to compress external text corpora, with higher compression efficiency indicating greater intelligence. They evaluate 31 public LLMs across 12 benchmarks, focusing on three key areas: knowledge and commonsense, coding, and mathematical reasoning. The results show a strong linear correlation between LLMs' compression efficiency (measured in bits per character) and their performance on these benchmarks. The Pearson correlation coefficient for each domain is around -0.95, indicating a strong negative correlation. The study also finds that compression efficiency is a reliable, unsupervised metric for evaluating LLMs, as it is linearly associated with model capabilities. The authors open-source their compression datasets and data collection pipelines to facilitate future research. The findings suggest that superior compression indicates greater intelligence, and that compression efficiency can be used as a stable and flexible metric for evaluating LLMs. The study also addresses potential issues such as overfitting and data leakage, and highlights the importance of using diverse and representative compression corpora to ensure accurate results. Overall, the paper provides empirical evidence supporting the belief that compression is closely related to intelligence, and that compression efficiency is a valid metric for assessing LLMs.This paper investigates the relationship between compression efficiency and intelligence in large language models (LLMs). The authors propose that intelligence can be measured by the ability of LLMs to compress external text corpora, with higher compression efficiency indicating greater intelligence. They evaluate 31 public LLMs across 12 benchmarks, focusing on three key areas: knowledge and commonsense, coding, and mathematical reasoning. The results show a strong linear correlation between LLMs' compression efficiency (measured in bits per character) and their performance on these benchmarks. The Pearson correlation coefficient for each domain is around -0.95, indicating a strong negative correlation. The study also finds that compression efficiency is a reliable, unsupervised metric for evaluating LLMs, as it is linearly associated with model capabilities. The authors open-source their compression datasets and data collection pipelines to facilitate future research. The findings suggest that superior compression indicates greater intelligence, and that compression efficiency can be used as a stable and flexible metric for evaluating LLMs. The study also addresses potential issues such as overfitting and data leakage, and highlights the importance of using diverse and representative compression corpora to ensure accurate results. Overall, the paper provides empirical evidence supporting the belief that compression is closely related to intelligence, and that compression efficiency is a valid metric for assessing LLMs.