23 Jan 2024 | Xiaoding Lu, Zongyi Liu, Adian Liusie, Vyas Raina, Vineet Mudupalli, Yuwen Zhang, William Beauchamp
The paper introduces a novel approach called *Blending* to enhance the performance of conversational AI models without the need for large computational resources. Traditional conversational AI models, such as ChatGPT, require significant computational power and memory due to their large number of parameters. The authors explore whether smaller models can collectively achieve comparable or better performance compared to a single large model. Their method, *Blending*, involves randomly selecting responses from multiple smaller models to form a single, more engaging and diverse chat AI. Empirical evidence suggests that blending three moderate-sized models (6B/13B parameters) can outperform or match the capabilities of a much larger model (175B+ parameters). This hypothesis is tested through A/B testing on a large user base over 30 days, showing that the blended model has higher user retention and engagement while requiring significantly less inference cost. The paper also discusses future work, including increasing the number of component models and optimizing the selection distribution to further improve conversational quality.The paper introduces a novel approach called *Blending* to enhance the performance of conversational AI models without the need for large computational resources. Traditional conversational AI models, such as ChatGPT, require significant computational power and memory due to their large number of parameters. The authors explore whether smaller models can collectively achieve comparable or better performance compared to a single large model. Their method, *Blending*, involves randomly selecting responses from multiple smaller models to form a single, more engaging and diverse chat AI. Empirical evidence suggests that blending three moderate-sized models (6B/13B parameters) can outperform or match the capabilities of a much larger model (175B+ parameters). This hypothesis is tested through A/B testing on a large user base over 30 days, showing that the blended model has higher user retention and engagement while requiring significantly less inference cost. The paper also discusses future work, including increasing the number of component models and optimizing the selection distribution to further improve conversational quality.