Qwen2 Technical Report

Qwen2 Technical Report

18 Jul 2024 | An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Kenming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, Tianhao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Xuejing Liu, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zhifang Guo, and Zhihao Fan
This technical report introduces the Qwen2 series, a comprehensive suite of foundational and instruction-tuned language models with parameter counts ranging from 0.5 to 72 billion. The models, which include dense and Mixture-of-Experts (MoE) architectures, surpass previous open-weight models, including Qwen1.5, and perform competitively against proprietary models across various benchmarks in language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, achieves notable scores on multiple benchmarks, and the instruction-tuned variant, Qwen2-72B-Instruct, excels in specific tasks like MT-Bench, Arena-Hard, and LiveCodeBench. Qwen2 also demonstrates robust multilingual capabilities, supporting approximately 30 languages. To promote community innovation and accessibility, the Qwen2 model weights are openly available on Hugging Face and ModelScope, along with supplementary materials and resources for quantization, fine-tuning, and deployment. The report details the model architecture, pre-training and post-training processes, and comprehensive evaluation protocols, highlighting Qwen2's superior performance and its potential for diverse applications and research.This technical report introduces the Qwen2 series, a comprehensive suite of foundational and instruction-tuned language models with parameter counts ranging from 0.5 to 72 billion. The models, which include dense and Mixture-of-Experts (MoE) architectures, surpass previous open-weight models, including Qwen1.5, and perform competitively against proprietary models across various benchmarks in language understanding, generation, multilingual proficiency, coding, mathematics, and reasoning. The flagship model, Qwen2-72B, achieves notable scores on multiple benchmarks, and the instruction-tuned variant, Qwen2-72B-Instruct, excels in specific tasks like MT-Bench, Arena-Hard, and LiveCodeBench. Qwen2 also demonstrates robust multilingual capabilities, supporting approximately 30 languages. To promote community innovation and accessibility, the Qwen2 model weights are openly available on Hugging Face and ModelScope, along with supplementary materials and resources for quantization, fine-tuning, and deployment. The report details the model architecture, pre-training and post-training processes, and comprehensive evaluation protocols, highlighting Qwen2's superior performance and its potential for diverse applications and research.
Reach us at info@study.space
Understanding Qwen2 Technical Report