2024 | Anke Tang, Li Shen, Yong Luo, Nan Yin, Lefei Zhang, Dacheng Tao
This paper proposes a novel method to merge Transformer-based vision models from different tasks into a unified multi-task model. The key insight is to separate and dynamically integrate shared and task-specific knowledge using a Weight-Ensembling Mixture of Experts (WEMoE) module. This module upcycles the MLPs in the Transformer layers to dynamically combine shared and task-specific knowledge based on the input, addressing the issue of parameter interference. The method is evaluated through conventional multi-task model merging experiments and shown to be effective and flexible, outperforming existing methods in terms of performance and adaptability. The results demonstrate the effectiveness of the proposed method and provide a comprehensive understanding of its capabilities.This paper proposes a novel method to merge Transformer-based vision models from different tasks into a unified multi-task model. The key insight is to separate and dynamically integrate shared and task-specific knowledge using a Weight-Ensembling Mixture of Experts (WEMoE) module. This module upcycles the MLPs in the Transformer layers to dynamically combine shared and task-specific knowledge based on the input, addressing the issue of parameter interference. The method is evaluated through conventional multi-task model merging experiments and shown to be effective and flexible, outperforming existing methods in terms of performance and adaptability. The results demonstrate the effectiveness of the proposed method and provide a comprehensive understanding of its capabilities.