Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

2024 | Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu
This paper introduces a novel fine-tuning method called Self-Play Fine-Tuning (SPIN) that transforms a weak Large Language Model (LLM) into a strong one without requiring additional human-annotated data. SPIN leverages a self-play mechanism where the LLM refines its capabilities by generating its own training data and distinguishing between responses generated by itself and those from human-annotated data. The method iteratively improves the LLM's performance by aligning its distribution with the target data distribution, ultimately achieving human-level performance without external supervision. Theoretical analysis shows that the global optimum of the training objective is achieved when the LLM's distribution matches the target data distribution. Empirical results demonstrate that SPIN significantly enhances LLM performance across various benchmarks, even outperforming models trained with additional human data or AI feedback. SPIN achieves this by using synthetic data generated from the LLM itself, eliminating the need for external human annotations or advanced models for feedback. Compared to traditional methods like Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), SPIN enables the LLM to self-improve without additional human data or feedback from stronger models. The method is effective in improving model performance, as shown by results on benchmark datasets such as the HuggingFace Open LLM Leaderboard, MT-Bench, and others. SPIN's self-play mechanism is similar to Generative Adversarial Networks (GANs), where the LLM acts as both the generator and discriminator, refining its responses through iterative training. The study also highlights the importance of iterative training in SPIN, as further training within a single iteration reaches a limit, making iterative training essential for continued improvement. The method's effectiveness is demonstrated through experiments showing that SPIN can achieve performance comparable to or better than models trained with additional data, even without new human annotations. This underscores the potential of self-play in enhancing LLMs without the need for expert supervision.This paper introduces a novel fine-tuning method called Self-Play Fine-Tuning (SPIN) that transforms a weak Large Language Model (LLM) into a strong one without requiring additional human-annotated data. SPIN leverages a self-play mechanism where the LLM refines its capabilities by generating its own training data and distinguishing between responses generated by itself and those from human-annotated data. The method iteratively improves the LLM's performance by aligning its distribution with the target data distribution, ultimately achieving human-level performance without external supervision. Theoretical analysis shows that the global optimum of the training objective is achieved when the LLM's distribution matches the target data distribution. Empirical results demonstrate that SPIN significantly enhances LLM performance across various benchmarks, even outperforming models trained with additional human data or AI feedback. SPIN achieves this by using synthetic data generated from the LLM itself, eliminating the need for external human annotations or advanced models for feedback. Compared to traditional methods like Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), SPIN enables the LLM to self-improve without additional human data or feedback from stronger models. The method is effective in improving model performance, as shown by results on benchmark datasets such as the HuggingFace Open LLM Leaderboard, MT-Bench, and others. SPIN's self-play mechanism is similar to Generative Adversarial Networks (GANs), where the LLM acts as both the generator and discriminator, refining its responses through iterative training. The study also highlights the importance of iterative training in SPIN, as further training within a single iteration reaches a limit, making iterative training essential for continued improvement. The method's effectiveness is demonstrated through experiments showing that SPIN can achieve performance comparable to or better than models trained with additional data, even without new human annotations. This underscores the potential of self-play in enhancing LLMs without the need for expert supervision.
Reach us at info@study.space
[slides] Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models | StudySpace