The paper explores the relationship between compression and intelligence in the context of large language models (LLMs). It argues that learning to compress well can lead to greater intelligence, a belief supported by recent advancements in language modeling. The study examines this relationship by treating LLMs as data compressors and using average downstream benchmark scores as a surrogate for intelligence, focusing on knowledge and commonsense, coding, and mathematical reasoning. Across 12 benchmarks and 31 public LLMs from diverse organizations, the authors find a nearly linear correlation between LLMs' intelligence and their ability to compress external text corpora. This correlation is significant, with a Pearson correlation coefficient of around -0.95 for each domain of intelligence. The findings suggest that compression efficiency, an unsupervised metric derived from raw text corpora, serves as a reliable evaluation measure for LLMs. The paper also discusses the practical implications of this relationship, advocating for the use of compression efficiency as a stable and flexible metric to assess LLMs. Additionally, the authors open-source their compression datasets and data collection pipelines to facilitate future research.The paper explores the relationship between compression and intelligence in the context of large language models (LLMs). It argues that learning to compress well can lead to greater intelligence, a belief supported by recent advancements in language modeling. The study examines this relationship by treating LLMs as data compressors and using average downstream benchmark scores as a surrogate for intelligence, focusing on knowledge and commonsense, coding, and mathematical reasoning. Across 12 benchmarks and 31 public LLMs from diverse organizations, the authors find a nearly linear correlation between LLMs' intelligence and their ability to compress external text corpora. This correlation is significant, with a Pearson correlation coefficient of around -0.95 for each domain of intelligence. The findings suggest that compression efficiency, an unsupervised metric derived from raw text corpora, serves as a reliable evaluation measure for LLMs. The paper also discusses the practical implications of this relationship, advocating for the use of compression efficiency as a stable and flexible metric to assess LLMs. Additionally, the authors open-source their compression datasets and data collection pipelines to facilitate future research.