WHEN SCALING MEETS LLM FINETUNING: THE EFFECT OF DATA, MODEL AND FINETUNING METHOD

WHEN SCALING MEETS LLM FINETUNING: THE EFFECT OF DATA, MODEL AND FINETUNING METHOD

2024 | Biao Zhang, Zhongtao Liu, Colin Cherry, Orhan Firat
This paper investigates the scaling behavior of large language model (LLM) finetuning with respect to various factors, including LLM model size, pretraining data size, new finetuning parameter size, and finetuning data size. The study focuses on two main finetuning approaches: full-model tuning (FMT) and parameter-efficient tuning (PET), which includes prompt tuning and low-rank adaptation (LoRA). The research explores how these factors affect the performance of LLM finetuning, particularly in data-limited scenarios where the LLM model size is much larger than the finetuning data size. The study uses two sets of pretrained bilingual LLMs with sizes ranging from 1B to 16B and evaluates them on bilingual machine translation and multilingual summarization tasks. The results show that LLM finetuning follows a power-based multiplicative joint scaling law between finetuning data size and other scaling factors. This law indicates that LLM model scaling has a more significant impact on finetuning performance than pretraining data scaling, and PET parameter scaling is generally ineffective. The optimal finetuning method is highly task- and data-dependent, with PET methods like LoRA and Prompt performing better than FMT in many cases. The study also finds that the scaling properties of LLM finetuning are highly task- and data-dependent, making the selection of the best finetuning method for a downstream task non-trivial. The research highlights the importance of understanding the scaling behavior of LLM finetuning to guide the selection and development of effective finetuning methods. The findings suggest that while larger LLM models and more pretraining data can improve performance, the effectiveness of PET methods depends on the specific task and data characteristics. The study provides empirical evidence for the joint scaling law and demonstrates that the performance of LLM finetuning can be improved by carefully selecting the appropriate finetuning method based on the task and data conditions.This paper investigates the scaling behavior of large language model (LLM) finetuning with respect to various factors, including LLM model size, pretraining data size, new finetuning parameter size, and finetuning data size. The study focuses on two main finetuning approaches: full-model tuning (FMT) and parameter-efficient tuning (PET), which includes prompt tuning and low-rank adaptation (LoRA). The research explores how these factors affect the performance of LLM finetuning, particularly in data-limited scenarios where the LLM model size is much larger than the finetuning data size. The study uses two sets of pretrained bilingual LLMs with sizes ranging from 1B to 16B and evaluates them on bilingual machine translation and multilingual summarization tasks. The results show that LLM finetuning follows a power-based multiplicative joint scaling law between finetuning data size and other scaling factors. This law indicates that LLM model scaling has a more significant impact on finetuning performance than pretraining data scaling, and PET parameter scaling is generally ineffective. The optimal finetuning method is highly task- and data-dependent, with PET methods like LoRA and Prompt performing better than FMT in many cases. The study also finds that the scaling properties of LLM finetuning are highly task- and data-dependent, making the selection of the best finetuning method for a downstream task non-trivial. The research highlights the importance of understanding the scaling behavior of LLM finetuning to guide the selection and development of effective finetuning methods. The findings suggest that while larger LLM models and more pretraining data can improve performance, the effectiveness of PET methods depends on the specific task and data characteristics. The study provides empirical evidence for the joint scaling law and demonstrates that the performance of LLM finetuning can be improved by carefully selecting the appropriate finetuning method based on the task and data conditions.
Reach us at info@study.space
[slides] When Scaling Meets LLM Finetuning%3A The Effect of Data%2C Model and Finetuning Method | StudySpace