Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging

17 Jun 2024 | Zhenyi Lu1,2*, Chenghao Fan1,2*, Wei Wei1,2†, Xiaoye Qu1, Dangyang Chen3, Yu Cheng4
The paper introduces Twin-Merging, a novel method for integrating multiple task-specific models into a single multitask model without additional training. Traditional model merging methods often suffer from performance gaps compared to fine-tuned models due to interference between different models and heterogeneous data during testing. Twin-Merging addresses these issues by modularizing knowledge into shared and exclusive components, compressing the exclusive knowledge to enhance efficiency, and dynamically merging these components based on input data. This approach narrows the performance gap and improves adaptability to diverse data. Extensive experiments on 12 datasets for both discriminative and generative tasks demonstrate the effectiveness of Twin-Merging, showing an average improvement of 28.34% in normalized scores for discriminative tasks and surpassing the fine-tuned upper bound on generative tasks. The method is scalable, extensible, and storage-efficient, making it a powerful solution for combining multiple fine-tuned models into a single multitask model.The paper introduces Twin-Merging, a novel method for integrating multiple task-specific models into a single multitask model without additional training. Traditional model merging methods often suffer from performance gaps compared to fine-tuned models due to interference between different models and heterogeneous data during testing. Twin-Merging addresses these issues by modularizing knowledge into shared and exclusive components, compressing the exclusive knowledge to enhance efficiency, and dynamically merging these components based on input data. This approach narrows the performance gap and improves adaptability to diverse data. Extensive experiments on 12 datasets for both discriminative and generative tasks demonstrate the effectiveness of Twin-Merging, showing an average improvement of 28.34% in normalized scores for discriminative tasks and surpassing the fine-tuned upper bound on generative tasks. The method is scalable, extensible, and storage-efficient, making it a powerful solution for combining multiple fine-tuned models into a single multitask model.
Reach us at info@study.space
[slides and audio] Twin-Merging%3A Dynamic Integration of Modular Expertise in Model Merging