The paper introduces CALM (Composition to Augment Language Models), a framework that enables the composition of large language models (LLMs) with specialized models to enhance their capabilities. CALM addresses the challenge of efficiently and practically augmenting LLMs by composing them with more specific models, without modifying the weights of the existing models. The key features of CALM include:
1. **Cross-Attention**: CALM introduces cross-attention between the anchor LLM and the augmenting model, allowing them to share representations and learn from each other.
2. **Efficiency**: The framework scales up LLMs on new tasks by reusing existing models with a few additional parameters and data, while preserving the existing capabilities.
3. **Versatility**: CALM can be applied to diverse domains and settings, such as language inclusivity and code generation.
The authors demonstrate the effectiveness of CALM through experiments on three domains:
- **Key-Value Arithmetic**: The framework enables the composition of a model trained on string-key-value mappings with an arithmetic-capable LLM, achieving high accuracy in solving arithmetic expressions.
- **Low-Resource Language Inclusivity**: By composing a model trained on low-resource languages with a general-purpose LLM, CALM significantly improves translation and math-word problem-solving capabilities in these languages.
- **Code Understanding and Generation**: CALM enhances code completion, text-to-code, and code-to-text tasks by combining a code-specific model with a general-purpose LLM.
The results show that CALM outperforms both individual models and fine-tuned versions of the anchor model, demonstrating its ability to effectively leverage the strengths of specialized models.The paper introduces CALM (Composition to Augment Language Models), a framework that enables the composition of large language models (LLMs) with specialized models to enhance their capabilities. CALM addresses the challenge of efficiently and practically augmenting LLMs by composing them with more specific models, without modifying the weights of the existing models. The key features of CALM include:
1. **Cross-Attention**: CALM introduces cross-attention between the anchor LLM and the augmenting model, allowing them to share representations and learn from each other.
2. **Efficiency**: The framework scales up LLMs on new tasks by reusing existing models with a few additional parameters and data, while preserving the existing capabilities.
3. **Versatility**: CALM can be applied to diverse domains and settings, such as language inclusivity and code generation.
The authors demonstrate the effectiveness of CALM through experiments on three domains:
- **Key-Value Arithmetic**: The framework enables the composition of a model trained on string-key-value mappings with an arithmetic-capable LLM, achieving high accuracy in solving arithmetic expressions.
- **Low-Resource Language Inclusivity**: By composing a model trained on low-resource languages with a general-purpose LLM, CALM significantly improves translation and math-word problem-solving capabilities in these languages.
- **Code Understanding and Generation**: CALM enhances code completion, text-to-code, and code-to-text tasks by combining a code-specific model with a general-purpose LLM.
The results show that CALM outperforms both individual models and fine-tuned versions of the anchor model, demonstrating its ability to effectively leverage the strengths of specialized models.