Understanding Learning to Decode Collaboratively with Multiple Language Models

The paper introduces Co-LLM, a method for teaching multiple large language models (LLMs) to collaborate by interleaving their generations at the token level. The decision of which LLM generates the next token is modeled as a latent variable, and the base LLM learns to decide when to generate itself and when to call on an "assistant" LLM. This approach allows for a tailored fusion of each model's expertise, particularly useful in cross-domain settings where a generalist base LLM learns to invoke domain expert models. The method is evaluated on tasks such as instruction-following, domain-specific QA, and reasoning, showing that the joint system outperforms individual models. Qualitative analysis reveals interesting collaboration patterns, such as template-filling. The paper also discusses the benefits of the latent-variable framework and compares Co-LLM to other collaborative models, demonstrating its effectiveness in enabling successful collaboration between different models and scales.The paper introduces Co-LLM, a method for teaching multiple large language models (LLMs) to collaborate by interleaving their generations at the token level. The decision of which LLM generates the next token is modeled as a latent variable, and the base LLM learns to decide when to generate itself and when to call on an "assistant" LLM. This approach allows for a tailored fusion of each model's expertise, particularly useful in cross-domain settings where a generalist base LLM learns to invoke domain expert models. The method is evaluated on tasks such as instruction-following, domain-specific QA, and reasoning, showing that the joint system outperforms individual models. Qualitative analysis reveals interesting collaboration patterns, such as template-filling. The paper also discusses the benefits of the latent-variable framework and compares Co-LLM to other collaborative models, demonstrating its effectiveness in enabling successful collaboration between different models and scales.

Learning to Decode Collaboratively with Multiple Language Models

6 Mar 2024 | Shannon Zejiang Shen, Hunter Lang, Bailin Wang, Yoon Kim, David Sontag