The report introduces ChatGLM, an evolving family of large language models developed by Zhipu AI and Tsinghua University. The focus is on the GLM-4 series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. These models are trained on ten trillions of tokens, primarily in Chinese and English, with a multi-stage post-training process involving supervised fine-tuning and human feedback. Evaluations show that GLM-4:
1. Rivals or outperforms GPT-4 in general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval.
2. Matches GPT-4 Turbo in instruction following (IFEval).
3. Matches GPT-4 Turbo and Claude 3 in long context tasks.
4. Outperforms GPT-4 in Chinese alignment (AlignBench).
The GLM-4 All Tools model is designed to understand user intent and autonomously select tools to complete complex tasks, such as web browsing and solving math problems using Python. The team has open-sourced several models, including ChatGLM-6B, GLM-4-9B, and CodeGeeX, which have attracted over 10 million downloads on Hugging Face in 2023. The report also details the pre-training data, architecture, and alignment techniques used, highlighting the importance of data quality and diversity. Safety and risk mitigation measures are discussed, and the team is committed to promoting accessibility and safety through open-source efforts.The report introduces ChatGLM, an evolving family of large language models developed by Zhipu AI and Tsinghua University. The focus is on the GLM-4 series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. These models are trained on ten trillions of tokens, primarily in Chinese and English, with a multi-stage post-training process involving supervised fine-tuning and human feedback. Evaluations show that GLM-4:
1. Rivals or outperforms GPT-4 in general metrics such as MMLU, GSM8K, MATH, BBH, GPQA, and HumanEval.
2. Matches GPT-4 Turbo in instruction following (IFEval).
3. Matches GPT-4 Turbo and Claude 3 in long context tasks.
4. Outperforms GPT-4 in Chinese alignment (AlignBench).
The GLM-4 All Tools model is designed to understand user intent and autonomously select tools to complete complex tasks, such as web browsing and solving math problems using Python. The team has open-sourced several models, including ChatGLM-6B, GLM-4-9B, and CodeGeeX, which have attracted over 10 million downloads on Hugging Face in 2023. The report also details the pre-training data, architecture, and alignment techniques used, highlighting the importance of data quality and diversity. Safety and risk mitigation measures are discussed, and the team is committed to promoting accessibility and safety through open-source efforts.