[slides and audio] DeepSeek LLM%3A Scaling Open-Source Language Models with Longtermism

The paper "DeepSeek LLM: Scaling Open-Source Language Models with Longtermism" by Xiao Bi et al. explores the scaling laws of large language models (LLMs) and introduces DeepSeek LLM, a project aimed at advancing open-source LLMs with a long-term perspective. The authors delve into the study of scaling laws and present their findings, which guide the development of DeepSeek LLM in two prevalent configurations: 7B and 67B. They have developed a dataset consisting of 2 trillion tokens and conducted supervised fine-tuning (SFT) and direct preference optimization (DPO) to create DeepSeek Chat models. Evaluation results show that DeepSeek LLM 67B outperforms LLaMA-2 70B across various benchmarks, particularly in code, mathematics, and reasoning tasks. Additionally, DeepSeek LLM 67B Chat outperforms GPT-3.5 in both Chinese and English open-ended evaluations, demonstrating superior performance in generating high-quality responses and engaging in meaningful conversations. The paper also discusses the safety evaluation of DeepSeek LLM, highlighting its ability to provide harmless responses in practical scenarios.The paper "DeepSeek LLM: Scaling Open-Source Language Models with Longtermism" by Xiao Bi et al. explores the scaling laws of large language models (LLMs) and introduces DeepSeek LLM, a project aimed at advancing open-source LLMs with a long-term perspective. The authors delve into the study of scaling laws and present their findings, which guide the development of DeepSeek LLM in two prevalent configurations: 7B and 67B. They have developed a dataset consisting of 2 trillion tokens and conducted supervised fine-tuning (SFT) and direct preference optimization (DPO) to create DeepSeek Chat models. Evaluation results show that DeepSeek LLM 67B outperforms LLaMA-2 70B across various benchmarks, particularly in code, mathematics, and reasoning tasks. Additionally, DeepSeek LLM 67B Chat outperforms GPT-3.5 in both Chinese and English open-ended evaluations, demonstrating superior performance in generating high-quality responses and engaging in meaningful conversations. The paper also discusses the safety evaluation of DeepSeek LLM, highlighting its ability to provide harmless responses in practical scenarios.