This paper introduces a novel approach to selecting large language models (LLMs) for fine-tuning by leveraging the Rectified Scaling Law. The challenge of selecting the most appropriate pre-trained model for fine-tuning is addressed by formulating the task as predicting the full-fine-tuning performance of a model, which is closely related to the Scaling Law. Unlike pre-training, the fine-tuning scaling curve includes a previously unobserved "pre-power phase" in addition to the well-known "power phase." The authors explain why existing Scaling Laws fail to capture this phase transition and introduce the concept of "pre-learned data size" into the Rectified Scaling Law, which significantly improves its theoretical and empirical performance.
The Rectified Scaling Law is then used to propose a novel LLM selection algorithm called "Accept then Stop" (AtS), which selects the near-optimal model with hundreds of times less resource consumption. The algorithm works by iteratively fine-tuning models on progressively smaller subsets of data and using the resulting performance to predict the full-fine-tuning performance. This approach outperforms existing methods in terms of both accuracy and efficiency, as demonstrated through extensive experiments on 30 LLMs across three datasets.
The study shows that the pre-power phase is crucial for fine-tuning, as it represents a transition in the scaling behavior of the model's performance with respect to the size of the training data. The Rectified Scaling Law, which incorporates the concept of pre-learned data size, is able to capture this phase transition and provides a more accurate prediction of the full-fine-tuning performance. The AtS algorithm, based on this law, demonstrates superior performance in selecting the optimal model under various resource constraints. The results highlight the importance of understanding the scaling behavior of LLMs in fine-tuning and provide a robust framework for selecting the most appropriate model for a given task.This paper introduces a novel approach to selecting large language models (LLMs) for fine-tuning by leveraging the Rectified Scaling Law. The challenge of selecting the most appropriate pre-trained model for fine-tuning is addressed by formulating the task as predicting the full-fine-tuning performance of a model, which is closely related to the Scaling Law. Unlike pre-training, the fine-tuning scaling curve includes a previously unobserved "pre-power phase" in addition to the well-known "power phase." The authors explain why existing Scaling Laws fail to capture this phase transition and introduce the concept of "pre-learned data size" into the Rectified Scaling Law, which significantly improves its theoretical and empirical performance.
The Rectified Scaling Law is then used to propose a novel LLM selection algorithm called "Accept then Stop" (AtS), which selects the near-optimal model with hundreds of times less resource consumption. The algorithm works by iteratively fine-tuning models on progressively smaller subsets of data and using the resulting performance to predict the full-fine-tuning performance. This approach outperforms existing methods in terms of both accuracy and efficiency, as demonstrated through extensive experiments on 30 LLMs across three datasets.
The study shows that the pre-power phase is crucial for fine-tuning, as it represents a transition in the scaling behavior of the model's performance with respect to the size of the training data. The Rectified Scaling Law, which incorporates the concept of pre-learned data size, is able to capture this phase transition and provides a more accurate prediction of the full-fine-tuning performance. The AtS algorithm, based on this law, demonstrates superior performance in selecting the optimal model under various resource constraints. The results highlight the importance of understanding the scaling behavior of LLMs in fine-tuning and provide a robust framework for selecting the most appropriate model for a given task.