2024 | Zixiang Chen * Yihe Deng * Huizhuo Yuan * Kaixuan Ji Quanquan Gu
This paper introduces a novel fine-tuning method called Self-Play Fine-Tuning (SPIN), which aims to convert a weak Large Language Model (LLM) into a strong one without the need for additional human-annotated data. SPIN leverages a self-play mechanism where the LLM refines its capabilities by generating its own training data and distinguishing between responses from its previous iterations and human-annotated data. The method is theoretically proven to converge when the LLM's distribution aligns with the target data distribution. Empirical results on various benchmarks, including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench, show that SPIN significantly improves LLM performance, even outperforming models trained with direct preference optimization (DPO) and additional GPT-4 preference data. SPIN's effectiveness is demonstrated through its ability to achieve human-level performance without expert opponents, making it a promising approach for enhancing LLMs without the need for extensive human annotation.This paper introduces a novel fine-tuning method called Self-Play Fine-Tuning (SPIN), which aims to convert a weak Large Language Model (LLM) into a strong one without the need for additional human-annotated data. SPIN leverages a self-play mechanism where the LLM refines its capabilities by generating its own training data and distinguishing between responses from its previous iterations and human-annotated data. The method is theoretically proven to converge when the LLM's distribution aligns with the target data distribution. Empirical results on various benchmarks, including the HuggingFace Open LLM Leaderboard, MT-Bench, and datasets from Big-Bench, show that SPIN significantly improves LLM performance, even outperforming models trained with direct preference optimization (DPO) and additional GPT-4 preference data. SPIN's effectiveness is demonstrated through its ability to achieve human-level performance without expert opponents, making it a promising approach for enhancing LLMs without the need for extensive human annotation.