MiniCPM is a series of small language models (SLMs) with 1.2B and 2.4B non-embedding parameters, which perform well in their respective categories and are comparable to 7B-13B large language models (LLMs). The paper introduces MiniCPM, which demonstrates scalability in both model and data dimensions, enabling efficient study of data-model scaling laws without extensive retraining. The WSD learning rate scheduler (LRS) is introduced, which allows for continuous training and domain adaptation, and enables the study of scaling laws with linear effort on the model axis and negligible effort on the data axis. The paper also introduces the MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE, and MiniCPM-128K, which show excellent performance in various tasks. The paper discusses the training dynamics of MiniCPM, the scaling law, and the performance of the MiniCPM family on benchmark datasets. The results show that MiniCPM outperforms other SLMs in several tasks, and the WSD LRS enables efficient training and scaling. The paper concludes that MiniCPM represents a new stage in the development of small language models, demonstrating the potential of SLMs and advocating for a more scientific and sustainable approach to scaling LLMs.MiniCPM is a series of small language models (SLMs) with 1.2B and 2.4B non-embedding parameters, which perform well in their respective categories and are comparable to 7B-13B large language models (LLMs). The paper introduces MiniCPM, which demonstrates scalability in both model and data dimensions, enabling efficient study of data-model scaling laws without extensive retraining. The WSD learning rate scheduler (LRS) is introduced, which allows for continuous training and domain adaptation, and enables the study of scaling laws with linear effort on the model axis and negligible effort on the data axis. The paper also introduces the MiniCPM family, including MiniCPM-DPO, MiniCPM-MoE, and MiniCPM-128K, which show excellent performance in various tasks. The paper discusses the training dynamics of MiniCPM, the scaling law, and the performance of the MiniCPM family on benchmark datasets. The results show that MiniCPM outperforms other SLMs in several tasks, and the WSD LRS enables efficient training and scaling. The paper concludes that MiniCPM represents a new stage in the development of small language models, demonstrating the potential of SLMs and advocating for a more scientific and sustainable approach to scaling LLMs.