TASK-CUSTOMIZED MASKED AUTOENCODER VIA MIXTURE OF CLUSTER-CONDITIONAL EXPERTS

TASK-CUSTOMIZED MASKED AUTOENCODER VIA MIXTURE OF CLUSTER-CONDITIONAL EXPERTS

8 Feb 2024 | Zhili Liu1,2*, Kai Chen1*, Jianhua Han2, Lanqing Hong2, Hang Xu2, Zhenguo Li2, James T. Kwok1
The paper addresses the issue of negative transfer in Masked Autoencoders (MAE), where pre-trained models trained on semantically irrelevant data can perform poorly on downstream tasks with different data distributions. To tackle this, the authors propose a novel paradigm called Mixture of Cluster-conditional Experts (MoCE). MoCE trains each expert with semantically relevant images using cluster-conditional gates, allowing for task-customized pre-training. Unlike the traditional Mixture of Experts (MoE), MoCE ensures that similar clusters are routed to the same expert, improving the transfer performance. The proposed method is evaluated on 11 downstream tasks and shows an average improvement of 2.45% over vanilla MAE. Additionally, MoCE achieves state-of-the-art results on detection and segmentation tasks. The key contributions of the paper include a systematic analysis of negative transfer in MAE, the introduction of MoCE, and its effectiveness demonstrated through extensive experiments.The paper addresses the issue of negative transfer in Masked Autoencoders (MAE), where pre-trained models trained on semantically irrelevant data can perform poorly on downstream tasks with different data distributions. To tackle this, the authors propose a novel paradigm called Mixture of Cluster-conditional Experts (MoCE). MoCE trains each expert with semantically relevant images using cluster-conditional gates, allowing for task-customized pre-training. Unlike the traditional Mixture of Experts (MoE), MoCE ensures that similar clusters are routed to the same expert, improving the transfer performance. The proposed method is evaluated on 11 downstream tasks and shows an average improvement of 2.45% over vanilla MAE. Additionally, MoCE achieves state-of-the-art results on detection and segmentation tasks. The key contributions of the paper include a systematic analysis of negative transfer in MAE, the introduction of MoCE, and its effectiveness demonstrated through extensive experiments.
Reach us at info@study.space
[slides] Task-customized Masked Autoencoder via Mixture of Cluster-conditional Experts | StudySpace