MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

6 Jul 2024 | Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Yatian Pang, Munan Ning, Li Yuan
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models This paper proposes MoE-Tuning, a three-stage training strategy for Large Vision-Language Models (LVLMs) to prevent performance degradation caused by sparsity. MoE-LLaVA is a MoE-based sparse LVLM architecture that activates only the top-k experts during deployment, keeping the remaining experts inactive. The model achieves significant performance in various visual understanding and object hallucination benchmarks. MoE-LLaVA demonstrates performance comparable to LLaVA-1.5-7B on visual understanding datasets and surpasses LLaVA-1.5-13B in object hallucination benchmarks with only approximately 3B sparsely activated parameters. The model is designed to provide a baseline for sparse LVLMs and offer insights for future research in efficient and effective multi-modal learning systems. The code is available at https://github.com/PKU-YuanGroup/MoE-LLaVA.MoE-LLaVA: Mixture of Experts for Large Vision-Language Models This paper proposes MoE-Tuning, a three-stage training strategy for Large Vision-Language Models (LVLMs) to prevent performance degradation caused by sparsity. MoE-LLaVA is a MoE-based sparse LVLM architecture that activates only the top-k experts during deployment, keeping the remaining experts inactive. The model achieves significant performance in various visual understanding and object hallucination benchmarks. MoE-LLaVA demonstrates performance comparable to LLaVA-1.5-7B on visual understanding datasets and surpasses LLaVA-1.5-13B in object hallucination benchmarks with only approximately 3B sparsely activated parameters. The model is designed to provide a baseline for sparse LVLMs and offer insights for future research in efficient and effective multi-modal learning systems. The code is available at https://github.com/PKU-YuanGroup/MoE-LLaVA.
Reach us at info@study.space