Evolutionary Optimization of Model Merging Recipes

Evolutionary Optimization of Model Merging Recipes

19 Mar 2024 | Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, David Ha
The paper presents a novel application of evolutionary algorithms to automate the creation of powerful foundation models by merging diverse open-source models. The authors propose an evolutionary approach that overcomes the limitations of human intuition and domain knowledge, which currently dominate model merging. Their method operates in both parameter space and data flow space, allowing for optimization beyond just the weights of individual models. This approach facilitates cross-domain merging, generating models with capabilities that are not explicitly trained for, such as a Japanese LLM with Math reasoning capabilities. Key contributions of the work include: 1. **Automated Model Composition**: Evolutionary Model Merge (EMM) automatically discovers optimal combinations of diverse open-source models to create new foundation models. 2. **Cross-Domain Merging**: The method can merge models from different domains, potentially exceeding the capabilities of conventional human design strategies. 3. **State-of-the-Art Performance**: The generated Japanese LLM with Math reasoning and Japanese VLM achieve state-of-the-art performance on various benchmarks, even surpassing models with significantly more parameters. 4. **High Efficiency and Generalizability**: The 7B parameter LLM outperforms some previous 70B parameter Japanese LLMs, highlighting the high efficiency and generalization capability of the approach. 5. **Culturally-Aware VLM**: The Japanese VLM demonstrates its effectiveness in handling culturally-specific content, outperforming previous Japanese VLMs. The authors also discuss the background and related work, including the challenges and advancements in model merging and evolutionary neural architecture search. They detail their method, which involves evolving weights in parameter space and layer permutations in data flow space, and present experimental results to validate the effectiveness of their approach. Finally, they discuss future directions and limitations, emphasizing the potential for evolutionary model merging to unlock new capabilities and reduce the cost of foundation model development.The paper presents a novel application of evolutionary algorithms to automate the creation of powerful foundation models by merging diverse open-source models. The authors propose an evolutionary approach that overcomes the limitations of human intuition and domain knowledge, which currently dominate model merging. Their method operates in both parameter space and data flow space, allowing for optimization beyond just the weights of individual models. This approach facilitates cross-domain merging, generating models with capabilities that are not explicitly trained for, such as a Japanese LLM with Math reasoning capabilities. Key contributions of the work include: 1. **Automated Model Composition**: Evolutionary Model Merge (EMM) automatically discovers optimal combinations of diverse open-source models to create new foundation models. 2. **Cross-Domain Merging**: The method can merge models from different domains, potentially exceeding the capabilities of conventional human design strategies. 3. **State-of-the-Art Performance**: The generated Japanese LLM with Math reasoning and Japanese VLM achieve state-of-the-art performance on various benchmarks, even surpassing models with significantly more parameters. 4. **High Efficiency and Generalizability**: The 7B parameter LLM outperforms some previous 70B parameter Japanese LLMs, highlighting the high efficiency and generalization capability of the approach. 5. **Culturally-Aware VLM**: The Japanese VLM demonstrates its effectiveness in handling culturally-specific content, outperforming previous Japanese VLMs. The authors also discuss the background and related work, including the challenges and advancements in model merging and evolutionary neural architecture search. They detail their method, which involves evolving weights in parameter space and layer permutations in data flow space, and present experimental results to validate the effectiveness of their approach. Finally, they discuss future directions and limitations, emphasizing the potential for evolutionary model merging to unlock new capabilities and reduce the cost of foundation model development.
Reach us at info@study.space
[slides] Evolutionary Optimization of Model Merging Recipes | StudySpace