DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

17 Jun 2024 | Qihao Zhu*, Daya Guo*, Zhihong Shao*, Dejian Yang*, Peiyi Wang, Runxin Xu, Y. Wu Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang Junxiaio Song, Deli Chen, Xin Xie, Kang Guan, Yuxiang You, Aixin Liu, Qiushi Du, Wenjun Gao Xuan Lu, Qinyu Chen, Yaohui Wang, Chengqi Deng, Jiashi Li, Chenggang Zhao Chong Ruan, Fuli Luo, Wenfeng Liang
DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. It is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with an additional 6 trillion tokens, enhancing its coding and mathematical reasoning capabilities while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks and reasoning. The model supports 338 programming languages and extends the context length from 16K to 128K tokens. In standard benchmark evaluations, DeepSeek-Coder-V2 outperforms closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The paper details the model's architecture, training strategy, and evaluation results, highlighting its superior performance in code generation, code completion, code fixing, code understanding, mathematical reasoning, and general natural language tasks.DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. It is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with an additional 6 trillion tokens, enhancing its coding and mathematical reasoning capabilities while maintaining comparable performance in general language tasks. Compared to DeepSeek-Coder-33B, DeepSeek-Coder-V2 demonstrates significant advancements in various aspects of code-related tasks and reasoning. The model supports 338 programming languages and extends the context length from 16K to 128K tokens. In standard benchmark evaluations, DeepSeek-Coder-V2 outperforms closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. The paper details the model's architecture, training strategy, and evaluation results, highlighting its superior performance in code generation, code completion, code fixing, code understanding, mathematical reasoning, and general natural language tasks.
Reach us at info@study.space
[slides and audio] DeepSeek-Coder-V2%3A Breaking the Barrier of Closed-Source Models in Code Intelligence