28 Feb 2024 | Qineng Wang, Zihao Wang, Ying Su, Hanghang Tong, Yangqiu Song
**Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?**
Recent progress in large language models (LLMs) suggests that multi-agent discussions enhance reasoning abilities. However, systematic experiments reveal that a single-agent LLM with strong prompts can match the performance of multi-agent discussions on various reasoning tasks. Multi-agent discussions only outperform single agents when no demonstrations are provided. This study introduces a new multi-agent discussion framework, CMD, which simulates human group discussions. CMD outperforms existing frameworks in reasoning tasks, especially when no demonstrations are used. The framework also shows that stronger LLMs can improve the performance of weaker ones during interactions. The analysis highlights two common errors in multi-agent discussions: judge mistakes and wrong answer propagation. The findings suggest that multi-agent discussions are not always superior to single agents and that prompt engineering can significantly enhance LLM reasoning. The study also demonstrates that multi-agent discussions can be more effective in scenarios lacking expert knowledge or detailed examples. The results indicate that CMD is a promising approach for improving LLM reasoning through structured group discussions.**Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?**
Recent progress in large language models (LLMs) suggests that multi-agent discussions enhance reasoning abilities. However, systematic experiments reveal that a single-agent LLM with strong prompts can match the performance of multi-agent discussions on various reasoning tasks. Multi-agent discussions only outperform single agents when no demonstrations are provided. This study introduces a new multi-agent discussion framework, CMD, which simulates human group discussions. CMD outperforms existing frameworks in reasoning tasks, especially when no demonstrations are used. The framework also shows that stronger LLMs can improve the performance of weaker ones during interactions. The analysis highlights two common errors in multi-agent discussions: judge mistakes and wrong answer propagation. The findings suggest that multi-agent discussions are not always superior to single agents and that prompt engineering can significantly enhance LLM reasoning. The study also demonstrates that multi-agent discussions can be more effective in scenarios lacking expert knowledge or detailed examples. The results indicate that CMD is a promising approach for improving LLM reasoning through structured group discussions.