21 Mar 2024 | Charles Goddard, Shamane Siriwardhana, Malikeh Ehghaghi, Luke Meyers, Vlad Karpukhin, Brian Benedict, Mark McQuade, Jacob Solawetz
The paper introduces MergeKit, an open-source library designed to facilitate the merging of large language models (LLMs). The rapid expansion of open-source LLMs has led to a need for combining their competencies by merging their parameters. Model merging addresses challenges such as catastrophic forgetting and multitask learning, enhancing model performance and versatility without the need for additional training. MergeKit offers a comprehensive, extensible framework for merging models on various hardware, supported by detailed tutorials and IPython notebooks. It supports both identical and different architectures and initializations, with methods like linear averaging, task arithmetic, TIES merging, and SLERP. The library has been widely adopted, with thousands of models merged, leading to powerful open-source checkpoints. MergeKit's popularity is evident from its growing GitHub stars and its presence in the top-performing models on the Open LLM Leaderboard. The paper also provides practical examples, such as merging Meditron-7B and Llama2-7B chat models, demonstrating superior performance in medical benchmarks. The authors emphasize the ethical considerations and community engagement in the development and use of MergeKit, aiming to democratize access to cutting-edge AI technologies.The paper introduces MergeKit, an open-source library designed to facilitate the merging of large language models (LLMs). The rapid expansion of open-source LLMs has led to a need for combining their competencies by merging their parameters. Model merging addresses challenges such as catastrophic forgetting and multitask learning, enhancing model performance and versatility without the need for additional training. MergeKit offers a comprehensive, extensible framework for merging models on various hardware, supported by detailed tutorials and IPython notebooks. It supports both identical and different architectures and initializations, with methods like linear averaging, task arithmetic, TIES merging, and SLERP. The library has been widely adopted, with thousands of models merged, leading to powerful open-source checkpoints. MergeKit's popularity is evident from its growing GitHub stars and its presence in the top-performing models on the Open LLM Leaderboard. The paper also provides practical examples, such as merging Meditron-7B and Llama2-7B chat models, demonstrating superior performance in medical benchmarks. The authors emphasize the ethical considerations and community engagement in the development and use of MergeKit, aiming to democratize access to cutting-edge AI technologies.