3 Feb 2024 | Junyou Li, Qin Zhang, Yangbin Yu, Qiang Fu, Deheng Ye
This paper presents a study showing that adding more agents can significantly improve the performance of large language models (LLMs) without requiring complex methods. The key finding is that performance scales with the number of agents, and this method is orthogonal to existing techniques, with the degree of improvement depending on task difficulty. Comprehensive experiments on various LLM benchmarks confirm this, showing that increasing the number of agents enhances performance across a wide range of tasks. Surprisingly, smaller LLMs can achieve comparable or better results than larger ones when ensembled. The method is simple, involving sampling and voting, and can be combined with other methods to further improve performance. The results show that the method is effective for reasoning, generation, and code tasks, and that performance gains are more significant for harder tasks. The study also identifies three dimensions of task difficulty— inherent difficulty, reasoning steps, and prior probability of correct answers—and shows that performance gains increase with these factors. The method is compatible with various existing techniques and can be used to enhance their performance. The study concludes that more agents is all that is needed to improve LLM performance, without the need for complex methods.This paper presents a study showing that adding more agents can significantly improve the performance of large language models (LLMs) without requiring complex methods. The key finding is that performance scales with the number of agents, and this method is orthogonal to existing techniques, with the degree of improvement depending on task difficulty. Comprehensive experiments on various LLM benchmarks confirm this, showing that increasing the number of agents enhances performance across a wide range of tasks. Surprisingly, smaller LLMs can achieve comparable or better results than larger ones when ensembled. The method is simple, involving sampling and voting, and can be combined with other methods to further improve performance. The results show that the method is effective for reasoning, generation, and code tasks, and that performance gains are more significant for harder tasks. The study also identifies three dimensions of task difficulty— inherent difficulty, reasoning steps, and prior probability of correct answers—and shows that performance gains increase with these factors. The method is compatible with various existing techniques and can be used to enhance their performance. The study concludes that more agents is all that is needed to improve LLM performance, without the need for complex methods.