21 Jul 2024 | Isaac Ong†, Amjad Almahairi†, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez, M Waleed Kadous, Ion Stoica
RouteLLM: Learning to Route LLMs with Preference Data
This paper presents a framework for routing queries between large language models (LLMs) to balance cost and response quality. The authors propose a router model that dynamically selects between a stronger and a weaker LLM during inference, aiming to optimize the trade-off between cost and response quality. The router is trained using human preference data and data augmentation techniques, and is evaluated on widely-recognized benchmarks. The results show that the approach significantly reduces costs—by over 2 times in certain cases—without compromising the quality of responses. The router models also demonstrate significant transfer learning capabilities, maintaining their performance even when the strong and weak models are changed at test time.
The paper introduces a principled framework for learning a binary routing function between two classes of models: strong models (e.g., GPT-4) and weak models (e.g., Mixtral-8x7B). The framework uses preference data to train the router, which is then used to route queries to the most appropriate model. The router is evaluated on benchmarks such as MMLU and MT Bench, and the results show that the framework can significantly reduce costs while maintaining response quality.
The paper also explores different approaches for training the router, including similarity-weighted ranking, matrix factorization, BERT classification, and causal LLM classification. The results show that the best-performing router achieves significant cost savings while maintaining response quality. The authors also demonstrate that their routers can generalize to different model pairs without retraining, indicating that the framework is effective across a wide range of applications.
The paper concludes that the proposed framework provides a clear and scalable path to enhancing routing performance for specific use cases. The results highlight the effectiveness of dataset augmentation in improving router performance, and the authors believe that this framework has the potential to provide a cost-effective yet high-performance solution for deploying LLMs.RouteLLM: Learning to Route LLMs with Preference Data
This paper presents a framework for routing queries between large language models (LLMs) to balance cost and response quality. The authors propose a router model that dynamically selects between a stronger and a weaker LLM during inference, aiming to optimize the trade-off between cost and response quality. The router is trained using human preference data and data augmentation techniques, and is evaluated on widely-recognized benchmarks. The results show that the approach significantly reduces costs—by over 2 times in certain cases—without compromising the quality of responses. The router models also demonstrate significant transfer learning capabilities, maintaining their performance even when the strong and weak models are changed at test time.
The paper introduces a principled framework for learning a binary routing function between two classes of models: strong models (e.g., GPT-4) and weak models (e.g., Mixtral-8x7B). The framework uses preference data to train the router, which is then used to route queries to the most appropriate model. The router is evaluated on benchmarks such as MMLU and MT Bench, and the results show that the framework can significantly reduce costs while maintaining response quality.
The paper also explores different approaches for training the router, including similarity-weighted ranking, matrix factorization, BERT classification, and causal LLM classification. The results show that the best-performing router achieves significant cost savings while maintaining response quality. The authors also demonstrate that their routers can generalize to different model pairs without retraining, indicating that the framework is effective across a wide range of applications.
The paper concludes that the proposed framework provides a clear and scalable path to enhancing routing performance for specific use cases. The results highlight the effectiveness of dataset augmentation in improving router performance, and the authors believe that this framework has the potential to provide a cost-effective yet high-performance solution for deploying LLMs.