The Qwen Team from Alibaba Group introduces QWEN, a comprehensive series of large language models (LLMs) designed to revolutionize natural language processing tasks. QWEN includes base pretrained models and chat models finetuned with human alignment techniques, such as supervised finetuning (SFT) and reinforcement learning from human feedback (RLHF). The base models demonstrate superior performance across various downstream tasks, while the chat models, particularly those trained with RLHF, are highly competitive. Specialized models for coding (CODE-QWEN) and mathematics (MATH-QWEN-CHAT) have also been developed, showing significant improvements over open-source models. The report details the pretraining, alignment, and specialized model development processes, along with experimental results and evaluations. QWEN's performance is compared against other LLMs, including proprietary models like GPT-4, and its capabilities in tool use, code interpretation, and agent applications are highlighted. The team emphasizes the importance of human evaluation and the need for new evaluation methods tailored to aligned models.The Qwen Team from Alibaba Group introduces QWEN, a comprehensive series of large language models (LLMs) designed to revolutionize natural language processing tasks. QWEN includes base pretrained models and chat models finetuned with human alignment techniques, such as supervised finetuning (SFT) and reinforcement learning from human feedback (RLHF). The base models demonstrate superior performance across various downstream tasks, while the chat models, particularly those trained with RLHF, are highly competitive. Specialized models for coding (CODE-QWEN) and mathematics (MATH-QWEN-CHAT) have also been developed, showing significant improvements over open-source models. The report details the pretraining, alignment, and specialized model development processes, along with experimental results and evaluations. QWEN's performance is compared against other LLMs, including proprietary models like GPT-4, and its capabilities in tool use, code interpretation, and agent applications are highlighted. The team emphasizes the importance of human evaluation and the need for new evaluation methods tailored to aligned models.