Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

Merging Multi-Task Models via Weight-Ensembling Mixture of Experts

2024 | Anke Tang, Li Shen, Yong Luo, Nan Yin, Lefei Zhang, Dacheng Tao
This paper proposes a novel method for merging Transformer-based vision models across different tasks. The key idea is to merge most of the parameters while upscaling the multilayer perceptron (MLP) of the Transformer layers into a weight-ensembling mixture of experts (MoE) module. This module dynamically integrates shared and task-specific knowledge based on the input, enabling a more flexible solution that can adapt to the specific needs of each instance. The main contribution is the design of a novel Weight-Ensembling MoE (WEMoE) module, which allows for the dynamic integration of shared and task-specific knowledge. The method is evaluated through conventional multi-task model merging experiments, demonstrating its effectiveness and providing a comprehensive understanding of the approach. The results show that the proposed method outperforms existing methods in terms of performance and robustness. The method is also tested on various tasks and shows strong generalization and robustness. The WEMoE module is shown to be effective in handling both clean and distorted data, and its performance is stable with respect to the scaling coefficient of the task vector. The method is also compared with other approaches, and it is shown to be effective in various scenarios. The paper also discusses the limitations of existing methods and the potential of the proposed method in other scenarios, such as merging transformers from different modalities. The method is also shown to be effective in combination with parameter-efficient fine-tuning methods. The paper concludes that the proposed method is a promising approach for merging multi-task models and has the potential to significantly improve the efficiency and scalability of multi-task learning systems.This paper proposes a novel method for merging Transformer-based vision models across different tasks. The key idea is to merge most of the parameters while upscaling the multilayer perceptron (MLP) of the Transformer layers into a weight-ensembling mixture of experts (MoE) module. This module dynamically integrates shared and task-specific knowledge based on the input, enabling a more flexible solution that can adapt to the specific needs of each instance. The main contribution is the design of a novel Weight-Ensembling MoE (WEMoE) module, which allows for the dynamic integration of shared and task-specific knowledge. The method is evaluated through conventional multi-task model merging experiments, demonstrating its effectiveness and providing a comprehensive understanding of the approach. The results show that the proposed method outperforms existing methods in terms of performance and robustness. The method is also tested on various tasks and shows strong generalization and robustness. The WEMoE module is shown to be effective in handling both clean and distorted data, and its performance is stable with respect to the scaling coefficient of the task vector. The method is also compared with other approaches, and it is shown to be effective in various scenarios. The paper also discusses the limitations of existing methods and the potential of the proposed method in other scenarios, such as merging transformers from different modalities. The method is also shown to be effective in combination with parameter-efficient fine-tuning methods. The paper concludes that the proposed method is a promising approach for merging multi-task models and has the potential to significantly improve the efficiency and scalability of multi-task learning systems.
Reach us at info@study.space
Understanding Merging Multi-Task Models via Weight-Ensembling Mixture of Experts