2024 | Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, Wanli Ouyang
EMR-MERGING: Tuning-Free High-Performance Model Merging
Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, Wanli Ouyang
Abstract: The success of the pretrain-finetune paradigm has led to the release of numerous model weights. Merging models fine-tuned on different tasks to create a single model with multi-task capabilities is gaining attention for its practicality. Existing model merging methods suffer from performance degradation or require additional data or training. This paper proposes EMR-MERGING, which uses a unified model and lightweight task-specific modulators (masks and rescalers) to align the direction and magnitude between the unified model and each specific model. EMR-MERGING is tuning-free, requiring no data or additional training, and shows impressive performance. It outperforms existing methods in various settings, including merging different numbers of vision, NLP, PEFT, and multi-modal models.
Introduction: With the development of deep learning, various model architectures and training strategies have emerged. Pre-trained models have become increasingly significant, and fine-tuning on downstream tasks has become a standard paradigm. However, applying individual models to different tasks results in high storage and deployment costs. Multi-task learning (MTL) partially solves this problem but suffers from high computational costs and data unavailability. Model merging, which combines weights instead of additional training, is a promising solution.
This paper proposes EMR-MERGING, a new model merging paradigm that extracts a unified model weight and calculates and stores significant but lightweight task-specific parts of each model weight. The process involves electing a unified model, generating task-specific modulators (masks and rescalers), and applying them to align the unified model with each specific model. EMR-MERGING is tuning-free, requiring no data or additional training, and shows impressive performance. It outperforms existing methods in various settings, including merging different numbers of vision, NLP, PEFT, and multi-modal models.
The proposed method is validated on various classical and newly-established benchmarks, showing significant performance improvements. It is applicable to both full finetuned models and PEFT modules. The method is also effective in merging multi-modal models, showing the best performance on all vision-language tasks.
The method is simple but effective, with theoretical and empirical analysis supporting its effectiveness. It is applicable to various settings, including merging vision models, language models, and multi-modal models. The method is also effective in merging different numbers of models, showing better performance as the number of models increases.
The method is validated through extensive experiments, showing significant performance improvements compared to existing methods. It is applicable to various tasks, including merging ViT models, RoBERTa models, GPT-2 models, and multi-modal BEiT3 models. The method is also effective in merging PEFT models, showing better performance than existing methods.
The method is also effective in merging multi-modalEMR-MERGING: Tuning-Free High-Performance Model Merging
Chenyu Huang, Peng Ye, Tao Chen, Tong He, Xiangyu Yue, Wanli Ouyang
Abstract: The success of the pretrain-finetune paradigm has led to the release of numerous model weights. Merging models fine-tuned on different tasks to create a single model with multi-task capabilities is gaining attention for its practicality. Existing model merging methods suffer from performance degradation or require additional data or training. This paper proposes EMR-MERGING, which uses a unified model and lightweight task-specific modulators (masks and rescalers) to align the direction and magnitude between the unified model and each specific model. EMR-MERGING is tuning-free, requiring no data or additional training, and shows impressive performance. It outperforms existing methods in various settings, including merging different numbers of vision, NLP, PEFT, and multi-modal models.
Introduction: With the development of deep learning, various model architectures and training strategies have emerged. Pre-trained models have become increasingly significant, and fine-tuning on downstream tasks has become a standard paradigm. However, applying individual models to different tasks results in high storage and deployment costs. Multi-task learning (MTL) partially solves this problem but suffers from high computational costs and data unavailability. Model merging, which combines weights instead of additional training, is a promising solution.
This paper proposes EMR-MERGING, a new model merging paradigm that extracts a unified model weight and calculates and stores significant but lightweight task-specific parts of each model weight. The process involves electing a unified model, generating task-specific modulators (masks and rescalers), and applying them to align the unified model with each specific model. EMR-MERGING is tuning-free, requiring no data or additional training, and shows impressive performance. It outperforms existing methods in various settings, including merging different numbers of vision, NLP, PEFT, and multi-modal models.
The proposed method is validated on various classical and newly-established benchmarks, showing significant performance improvements. It is applicable to both full finetuned models and PEFT modules. The method is also effective in merging multi-modal models, showing the best performance on all vision-language tasks.
The method is simple but effective, with theoretical and empirical analysis supporting its effectiveness. It is applicable to various settings, including merging vision models, language models, and multi-modal models. The method is also effective in merging different numbers of models, showing better performance as the number of models increases.
The method is validated through extensive experiments, showing significant performance improvements compared to existing methods. It is applicable to various tasks, including merging ViT models, RoBERTa models, GPT-2 models, and multi-modal BEiT3 models. The method is also effective in merging PEFT models, showing better performance than existing methods.
The method is also effective in merging multi-modal