Understanding Enhance Reasoning for Large Language Models in the Game Werewolf

This paper introduces a novel framework that integrates Large Language Models (LLMs) with an external Thinker module to enhance the reasoning capabilities of LLM-based agents. Unlike traditional prompt engineering methods, the Thinker module directly leverages knowledge from databases and employs optimization techniques to handle complex logical analysis and domain-specific knowledge. The framework separates reasoning into two systems: System-1 tasks, handled by LLMs for intuitive tasks like natural language processing, and System-2 tasks, managed by the Thinker for strategic planning and deep analysis. The framework is tested in the 9-player Werewolf game, which requires dual-system reasoning. The Thinker is trained using data from 18,800 human sessions and reinforcement learning, and experiments show it significantly improves reasoning and generation capabilities. A 6B LLM fine-tuned with the Thinker outperforms GPT4 in most evaluations. The paper also releases the largest dataset for social deduction games to date. The framework demonstrates improved performance in deductive reasoning, speech generation, and online game evaluation. The Thinker module enhances the reasoning and generation capabilities of LLMs, and the framework is shown to be effective in complex reasoning tasks. The paper also discusses related work, methods, and experiments, highlighting the effectiveness of the proposed framework in social deduction games.This paper introduces a novel framework that integrates Large Language Models (LLMs) with an external Thinker module to enhance the reasoning capabilities of LLM-based agents. Unlike traditional prompt engineering methods, the Thinker module directly leverages knowledge from databases and employs optimization techniques to handle complex logical analysis and domain-specific knowledge. The framework separates reasoning into two systems: System-1 tasks, handled by LLMs for intuitive tasks like natural language processing, and System-2 tasks, managed by the Thinker for strategic planning and deep analysis. The framework is tested in the 9-player Werewolf game, which requires dual-system reasoning. The Thinker is trained using data from 18,800 human sessions and reinforcement learning, and experiments show it significantly improves reasoning and generation capabilities. A 6B LLM fine-tuned with the Thinker outperforms GPT4 in most evaluations. The paper also releases the largest dataset for social deduction games to date. The framework demonstrates improved performance in deductive reasoning, speech generation, and online game evaluation. The Thinker module enhances the reasoning and generation capabilities of LLMs, and the framework is shown to be effective in complex reasoning tasks. The paper also discusses related work, methods, and experiments, highlighting the effectiveness of the proposed framework in social deduction games.

Enhance Reasoning for Large Language Models in the Game Werewolf

29 Mar 2024 | Shuang Wu, Liwen Zhu, Tao Yang, Shiwei Xu, Qiang Fu, Wei Yang, Haobo Fu