Training-Free Pretrained Model Merging

Training-Free Pretrained Model Merging

15 Mar 2024 | Zhengqi Xu¹, Ke Yuan¹, Huiqiong Wang², Yong Wang³, Mingli Song¹, Jie Song¹*
This paper proposes a training-free model merging framework called Merging under Dual-Space Constraints (MuDSC) to combine multiple pre-trained models into a single multi-talent model. Existing methods either require additional training or depend on the same pre-trained initialization, but MuDSC addresses these issues by considering both activation and weight space similarities. The framework linearly combines similarity matrices from both spaces to find a better permutation matrix for unit matching. It also incorporates adaptations for group structures like Multi-Head Attention and Group Normalization. Experimental results show that MuDSC significantly improves the performance of merged models across various tasks and architectures. Visualization of the merged model in the multi-task loss landscape reveals that MuDSC enables the merged model to reside in the overlapping segment with unified lower loss for each task. The code is publicly available at https://github.com/zju-vipa/training_free_model_merging.This paper proposes a training-free model merging framework called Merging under Dual-Space Constraints (MuDSC) to combine multiple pre-trained models into a single multi-talent model. Existing methods either require additional training or depend on the same pre-trained initialization, but MuDSC addresses these issues by considering both activation and weight space similarities. The framework linearly combines similarity matrices from both spaces to find a better permutation matrix for unit matching. It also incorporates adaptations for group structures like Multi-Head Attention and Group Normalization. Experimental results show that MuDSC significantly improves the performance of merged models across various tasks and architectures. Visualization of the merged model in the multi-task loss landscape reveals that MuDSC enables the merged model to reside in the overlapping segment with unified lower loss for each task. The code is publicly available at https://github.com/zju-vipa/training_free_model_merging.
Reach us at info@study.space