23 May 2024 | Pengyu Cheng, Tianhao Hu, Han Xu, Zhisong Zhang, Yong Dai, Lei Han, Nan Du
This paper introduces a novel training method called Self-Play from Adversarial Language Game (SPAG) to enhance the reasoning ability of large language models (LLMs). The method involves a two-player adversarial language game called Adversarial Taboo, where an attacker and a defender engage in a conversation around a target word visible only to the attacker. The attacker aims to make the defender unintentionally reveal the target word, while the defender tries to infer the word without doing so. Both players must have strong reasoning and language skills to succeed.
The SPAG method uses self-play training, where LLMs play against copies of themselves in the adversarial game. Through reinforcement learning on the game outcomes, the LLMs' reasoning abilities improve across various benchmarks. The study uses open-source LLMs such as LLaMA-2-7B and Baichuan-2-13B, and shows that their performance improves significantly after multiple self-play iterations. The results indicate that SPAG can enhance LLM reasoning abilities effectively and consistently.
The paper also explores the effectiveness of SPAG through experiments on various reasoning benchmarks and game win rates. The results show that SPAG outperforms other methods such as Chain-of-Thought (CoT) and continuous supervised fine-tuning (SFT) in terms of reasoning performance. Additionally, the study highlights the potential of SPAG in improving the general language abilities of LLMs.
The paper also discusses the limitations of SPAG, including the computational resources required and the need for further research on value function estimation and advantage estimation. The study concludes that SPAG offers a promising approach to enhance LLM reasoning abilities through self-play training in adversarial language games.This paper introduces a novel training method called Self-Play from Adversarial Language Game (SPAG) to enhance the reasoning ability of large language models (LLMs). The method involves a two-player adversarial language game called Adversarial Taboo, where an attacker and a defender engage in a conversation around a target word visible only to the attacker. The attacker aims to make the defender unintentionally reveal the target word, while the defender tries to infer the word without doing so. Both players must have strong reasoning and language skills to succeed.
The SPAG method uses self-play training, where LLMs play against copies of themselves in the adversarial game. Through reinforcement learning on the game outcomes, the LLMs' reasoning abilities improve across various benchmarks. The study uses open-source LLMs such as LLaMA-2-7B and Baichuan-2-13B, and shows that their performance improves significantly after multiple self-play iterations. The results indicate that SPAG can enhance LLM reasoning abilities effectively and consistently.
The paper also explores the effectiveness of SPAG through experiments on various reasoning benchmarks and game win rates. The results show that SPAG outperforms other methods such as Chain-of-Thought (CoT) and continuous supervised fine-tuning (SFT) in terms of reasoning performance. Additionally, the study highlights the potential of SPAG in improving the general language abilities of LLMs.
The paper also discusses the limitations of SPAG, including the computational resources required and the need for further research on value function estimation and advantage estimation. The study concludes that SPAG offers a promising approach to enhance LLM reasoning abilities through self-play training in adversarial language games.