Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models

Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models

1 Sep 2024 | Songtao Jiang*, Tuo Zheng*, Yan Zhang, Yeying Jin, Li Yuan and Zuozhu Liu
The paper introduces Med-MoE (Mixture of Domain-Specific Experts), a lightweight framework designed for multimodal medical tasks, including both discriminative and generative tasks. Med-MoE addresses the challenges of resource-constrained environments by aligning medical images with language model tokens, enabling instruction tuning, and fine-tuning domain-specific experts. The model's training involves three phases: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning. The routing mechanism identifies input modalities, and a meta-expert captures global medical information. The model demonstrates superior performance or competitive results with state-of-the-art baselines on various medical datasets, such as VQA-RAD, SLAKE, and PathVQA, while requiring significantly fewer activated parameters. Extensive ablation studies and comparisons with other methods highlight the effectiveness and practical utility of Med-MoE in resource-limited healthcare settings.The paper introduces Med-MoE (Mixture of Domain-Specific Experts), a lightweight framework designed for multimodal medical tasks, including both discriminative and generative tasks. Med-MoE addresses the challenges of resource-constrained environments by aligning medical images with language model tokens, enabling instruction tuning, and fine-tuning domain-specific experts. The model's training involves three phases: multimodal medical alignment, instruction tuning and routing, and domain-specific MoE tuning. The routing mechanism identifies input modalities, and a meta-expert captures global medical information. The model demonstrates superior performance or competitive results with state-of-the-art baselines on various medical datasets, such as VQA-RAD, SLAKE, and PathVQA, while requiring significantly fewer activated parameters. Extensive ablation studies and comparisons with other methods highlight the effectiveness and practical utility of Med-MoE in resource-limited healthcare settings.
Reach us at info@study.space
[slides and audio] Med-MoE%3A Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models