This paper introduces CALM, a framework for efficiently composing existing foundation models with more specific models to enable new capabilities. CALM introduces cross-attention between models to compose their representations and enable new capabilities. Key features of CALM include scaling up LLMs on new tasks by reusing existing LLMs with a few additional parameters and data, preserving existing capabilities by keeping existing model weights intact, and applying to diverse domains and settings. The authors demonstrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, they see a relative improvement of 40% over the base model for code generation and explanation tasks—on-par with fully fine-tuned counterparts. The paper also discusses the practical applications of CALM in language inclusivity and code generation. For language inclusivity, they use a model trained on low-resource languages and observe that composing this model with the LLM allows them to borrow its generation and reasoning capabilities to achieve significantly better performance on translation and arithmetic reasoning tasks for low-resource languages. For code generation, they use a model trained on open-source code across a variety of programming languages and find that composing this model with the LLM outperforms the two base models on code explanation and code completion tasks. The paper also discusses related works, including parameter-efficient fine-tuning, model merging, and model and task compositionality. The authors propose a novel composition to augment language models (CALM) framework to address the general model composition setting mentioned above. CALM introduces a small number of trainable parameters over both augmenting and anchor models' intermediate layer representations. CALM finds an effective combination of the given models to perform new challenging tasks more accurately than either of the models alone, while preserving the capabilities of individual models. The paper also presents experiments in three domains: (a) an anchor LLM can be composed with an augmenting model trained on mappings between string keys and number values to solve arithmetic expressions over those keys requiring both, knowledge of the KV mappings and arithmetic capabilities; (b) how CALM can be used to expand the language coverage of an anchor LLM to low-resource languages it has not seen during pre-training; and (c) how code completion and explanation can be improved by composing an anchor LLM with an augmenting model specializing in the code domain. The results show that the composed model outperforms both base models in various tasks, demonstrating the effectiveness of CALM in enabling new capabilities through model composition.This paper introduces CALM, a framework for efficiently composing existing foundation models with more specific models to enable new capabilities. CALM introduces cross-attention between models to compose their representations and enable new capabilities. Key features of CALM include scaling up LLMs on new tasks by reusing existing LLMs with a few additional parameters and data, preserving existing capabilities by keeping existing model weights intact, and applying to diverse domains and settings. The authors demonstrate that augmenting PaLM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when PaLM2-S is augmented with a code-specific model, they see a relative improvement of 40% over the base model for code generation and explanation tasks—on-par with fully fine-tuned counterparts. The paper also discusses the practical applications of CALM in language inclusivity and code generation. For language inclusivity, they use a model trained on low-resource languages and observe that composing this model with the LLM allows them to borrow its generation and reasoning capabilities to achieve significantly better performance on translation and arithmetic reasoning tasks for low-resource languages. For code generation, they use a model trained on open-source code across a variety of programming languages and find that composing this model with the LLM outperforms the two base models on code explanation and code completion tasks. The paper also discusses related works, including parameter-efficient fine-tuning, model merging, and model and task compositionality. The authors propose a novel composition to augment language models (CALM) framework to address the general model composition setting mentioned above. CALM introduces a small number of trainable parameters over both augmenting and anchor models' intermediate layer representations. CALM finds an effective combination of the given models to perform new challenging tasks more accurately than either of the models alone, while preserving the capabilities of individual models. The paper also presents experiments in three domains: (a) an anchor LLM can be composed with an augmenting model trained on mappings between string keys and number values to solve arithmetic expressions over those keys requiring both, knowledge of the KV mappings and arithmetic capabilities; (b) how CALM can be used to expand the language coverage of an anchor LLM to low-resource languages it has not seen during pre-training; and (c) how code completion and explanation can be improved by composing an anchor LLM with an augmenting model specializing in the code domain. The results show that the composed model outperforms both base models in various tasks, demonstrating the effectiveness of CALM in enabling new capabilities through model composition.