7 Jun 2024 | Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou
This paper introduces a Mixture-of-Agents (MoA) approach to enhance the capabilities of large language models (LLMs) by leveraging the collective strengths of multiple LLMs through a layered architecture. The MoA framework allows each layer to consist of multiple LLM agents, where each agent uses outputs from previous layers as auxiliary information to generate responses. The method achieves state-of-the-art performance on benchmarks such as AlpacaEval 2.0, MT-Bench, and FLASK, outperforming GPT-4 Omni. For example, the MoA approach using only open-source LLMs achieves a 65.1% score on AlpacaEval 2.0, surpassing GPT-4 Omni's 57.5% score.
The MoA methodology is inspired by the concept of collaborativeness among LLMs, where models generate better responses when provided with outputs from other models. This phenomenon is validated through experiments showing that even lower-quality auxiliary responses can lead to improved performance. The MoA framework iteratively refines responses by passing outputs through multiple layers of LLM agents, with each layer selecting models based on performance and diversity criteria.
The MoA approach is evaluated on multiple benchmarks, demonstrating significant improvements in response quality. It achieves a new state-of-the-art win rate of 65.8% on AlpacaEval 2.0, surpassing the previous best of 57.5% by GPT-4 Omni. The method is also cost-effective, with MoA-Lite achieving a 1.8% improvement in quality on AlpacaEval 2.0 while being more cost-effective than GPT-4o.
The MoA framework is compared to other methods such as Mixture-of-Experts (MoE) and LLM-based rankers, showing superior performance in terms of response quality and efficiency. The approach is also effective in reasoning tasks, such as those in the MATH dataset, demonstrating its versatility and effectiveness across various applications. The study highlights the potential of MoA to enhance the capabilities of LLMs by leveraging the strengths of multiple models through collaborative synthesis.This paper introduces a Mixture-of-Agents (MoA) approach to enhance the capabilities of large language models (LLMs) by leveraging the collective strengths of multiple LLMs through a layered architecture. The MoA framework allows each layer to consist of multiple LLM agents, where each agent uses outputs from previous layers as auxiliary information to generate responses. The method achieves state-of-the-art performance on benchmarks such as AlpacaEval 2.0, MT-Bench, and FLASK, outperforming GPT-4 Omni. For example, the MoA approach using only open-source LLMs achieves a 65.1% score on AlpacaEval 2.0, surpassing GPT-4 Omni's 57.5% score.
The MoA methodology is inspired by the concept of collaborativeness among LLMs, where models generate better responses when provided with outputs from other models. This phenomenon is validated through experiments showing that even lower-quality auxiliary responses can lead to improved performance. The MoA framework iteratively refines responses by passing outputs through multiple layers of LLM agents, with each layer selecting models based on performance and diversity criteria.
The MoA approach is evaluated on multiple benchmarks, demonstrating significant improvements in response quality. It achieves a new state-of-the-art win rate of 65.8% on AlpacaEval 2.0, surpassing the previous best of 57.5% by GPT-4 Omni. The method is also cost-effective, with MoA-Lite achieving a 1.8% improvement in quality on AlpacaEval 2.0 while being more cost-effective than GPT-4o.
The MoA framework is compared to other methods such as Mixture-of-Experts (MoE) and LLM-based rankers, showing superior performance in terms of response quality and efficiency. The approach is also effective in reasoning tasks, such as those in the MATH dataset, demonstrating its versatility and effectiveness across various applications. The study highlights the potential of MoA to enhance the capabilities of LLMs by leveraging the strengths of multiple models through collaborative synthesis.