Understanding Training-Free Pretrained Model Merging

This paper introduces MuDSC (Merging under Dual-Space Constraints), a training-free model merging framework that addresses the inconsistency of unit similarity in weight space and activation space. Traditional model merging methods either require additional training or fine-tuning, or they rely on models with the same pre-trained initialization. MuDSC, on the other hand, explores permutation matrices in a region with high similarity in both weight and activation spaces, achieved through the linear combination of activation and weight similarity matrices. The framework also includes adaptations for group structures such as Multi-Head Attention and Group Normalization. Experimental results show that MuDSC significantly improves the performance of merged models across various tasks and architectures. Visualization of the merged model in the multi-task loss landscape reveals that MuDSC enables the model to reside in the overlapping segment with a unified lower loss for each task. The proposed method demonstrates superior performance compared to existing techniques, particularly in handling heterogeneous tasks and improving merged model accuracy. The code is publicly available for further research and application.This paper introduces MuDSC (Merging under Dual-Space Constraints), a training-free model merging framework that addresses the inconsistency of unit similarity in weight space and activation space. Traditional model merging methods either require additional training or fine-tuning, or they rely on models with the same pre-trained initialization. MuDSC, on the other hand, explores permutation matrices in a region with high similarity in both weight and activation spaces, achieved through the linear combination of activation and weight similarity matrices. The framework also includes adaptations for group structures such as Multi-Head Attention and Group Normalization. Experimental results show that MuDSC significantly improves the performance of merged models across various tasks and architectures. Visualization of the merged model in the multi-task loss landscape reveals that MuDSC enables the model to reside in the overlapping segment with a unified lower loss for each task. The proposed method demonstrates superior performance compared to existing techniques, particularly in handling heterogeneous tasks and improving merged model accuracy. The code is publicly available for further research and application.

Training-Free Pretrained Model Merging

15 Mar 2024 | Zhengqi Xu, Ke Yuan, Huiqiong Wang, Yong Wang, Mingli Song, Jie Song