D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models

3 Jun 2024 | Haoran Que*, Jiaheng Liu*, Ge Zhang, Chenchen Zhang, Xingwei Qu, Yinghao Ma, Feiyu Duan, Zhiqi Bai, Jiakai Wang, Yuanxing Zhang, Xu Tan, Jie Fu, Wenbo Su, Jiamang Wang, Lin Qu, Bo Zheng
This paper introduces the Domain-specific Continual Pre-Training (D-CPT) Law, a scaling law designed to determine the optimal mixture ratio between general-corpus and domain-corpus for Large Language Models (LLMs) in domain-specific continual pre-training. The D-CPT Law is derived from the Scaling Law, which is used to predict performance based on model size, dataset size, and training costs. The D-CPT Law allows for the prediction of general and domain-specific performance for various mixture ratios, model sizes, and dataset sizes with limited training costs. Additionally, the paper proposes the Cross-Domain D-CPT Law, which extends the D-CPT Law to cross-domain settings, enabling the prediction of D-CPT laws for new domains with minimal training costs. The D-CPT Law is validated through extensive experiments on six downstream domains, demonstrating its effectiveness and generalizability. The paper also discusses three practical applications of the D-CPT Law: optimizing the trade-off between general and domain-specific abilities, optimizing mixture ratios for limited domain-specific data, and resource allocation. The results show that the D-CPT Law significantly reduces the need for costly grid-searching efforts and provides a more efficient way to determine optimal mixture ratios for domain-specific continual pre-training. The Cross-Domain D-CPT Law further enhances the applicability of the D-CPT Law by enabling predictions for new domains with minimal training costs. The paper concludes that the D-CPT Law is an important step forward in the optimization of training LLMs for specific downstream domains and provides a foundation for further research in this area.This paper introduces the Domain-specific Continual Pre-Training (D-CPT) Law, a scaling law designed to determine the optimal mixture ratio between general-corpus and domain-corpus for Large Language Models (LLMs) in domain-specific continual pre-training. The D-CPT Law is derived from the Scaling Law, which is used to predict performance based on model size, dataset size, and training costs. The D-CPT Law allows for the prediction of general and domain-specific performance for various mixture ratios, model sizes, and dataset sizes with limited training costs. Additionally, the paper proposes the Cross-Domain D-CPT Law, which extends the D-CPT Law to cross-domain settings, enabling the prediction of D-CPT laws for new domains with minimal training costs. The D-CPT Law is validated through extensive experiments on six downstream domains, demonstrating its effectiveness and generalizability. The paper also discusses three practical applications of the D-CPT Law: optimizing the trade-off between general and domain-specific abilities, optimizing mixture ratios for limited domain-specific data, and resource allocation. The results show that the D-CPT Law significantly reduces the need for costly grid-searching efforts and provides a more efficient way to determine optimal mixture ratios for domain-specific continual pre-training. The Cross-Domain D-CPT Law further enhances the applicability of the D-CPT Law by enabling predictions for new domains with minimal training costs. The paper concludes that the D-CPT Law is an important step forward in the optimization of training LLMs for specific downstream domains and provides a foundation for further research in this area.
Reach us at info@study.space