22 Jan 2024 | Fanqi Wan1; Xinting Huang2; Deng Cai2, Xiaojun Quan1†, Wei Bi2, Shuming Shi2
This paper introduces the concept of knowledge fusion for large language models (LLMs), aiming to combine the capabilities of existing LLMs into a more powerful model. Traditional approaches to training LLMs from scratch are costly and may result in redundant capabilities. Instead, the authors propose a method called FUSELLM (Knowledge Fusion of LLMs) that leverages the generative distributions of source LLMs to externalize their collective knowledge and unique strengths, thereby enhancing the target model's performance. The approach is validated using three popular LLMs with different architectures—Llama-2, MPT, and OpenLLaMA—across various benchmarks and tasks. The results show that FUSELLM improves the target model's performance in reasoning, commonsense, and code generation tasks compared to individual source LLMs and traditional ensemble and weight merging methods. The paper also discusses the implementation details, including token alignment and fusion strategies, and compares FUSELLM with knowledge distillation and ensemble/weight merging techniques. The findings suggest that LLMs fusion is a promising avenue for creating more effective and efficient models.This paper introduces the concept of knowledge fusion for large language models (LLMs), aiming to combine the capabilities of existing LLMs into a more powerful model. Traditional approaches to training LLMs from scratch are costly and may result in redundant capabilities. Instead, the authors propose a method called FUSELLM (Knowledge Fusion of LLMs) that leverages the generative distributions of source LLMs to externalize their collective knowledge and unique strengths, thereby enhancing the target model's performance. The approach is validated using three popular LLMs with different architectures—Llama-2, MPT, and OpenLLaMA—across various benchmarks and tasks. The results show that FUSELLM improves the target model's performance in reasoning, commonsense, and code generation tasks compared to individual source LLMs and traditional ensemble and weight merging methods. The paper also discusses the implementation details, including token alignment and fusion strategies, and compares FUSELLM with knowledge distillation and ensemble/weight merging techniques. The findings suggest that LLMs fusion is a promising avenue for creating more effective and efficient models.