25 Oct 2023 | Aohan Zeng, Xiao Liu, Zhengxia Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Peng Zhang, Yuxiao Dong, Jie Tang
GLM-130B is a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. The authors aim to open-source a 100B-scale model that performs at least as well as GPT-3 (davinci) and to explore the challenges of training such large models. The training process includes design choices, strategies for efficiency and stability, and engineering efforts. GLM-130B outperforms GPT-3 175B on a wide range of benchmarks and consistently outperforms ERNIE TITAN 3.0 260B, the largest Chinese language model. It also achieves INT4 quantization without post-training, allowing inference on affordable GPUs. The model and code are publicly available, providing insights into LLM architecture, pre-training objectives, training stability, and efficient inference.GLM-130B is a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. The authors aim to open-source a 100B-scale model that performs at least as well as GPT-3 (davinci) and to explore the challenges of training such large models. The training process includes design choices, strategies for efficiency and stability, and engineering efforts. GLM-130B outperforms GPT-3 175B on a wide range of benchmarks and consistently outperforms ERNIE TITAN 3.0 260B, the largest Chinese language model. It also achieves INT4 quantization without post-training, allowing inference on affordable GPUs. The model and code are publicly available, providing insights into LLM architecture, pre-training objectives, training stability, and efficient inference.