MM-LLMs: Recent Advances in MultiModal Large Language Models

MM-LLMs: Recent Advances in MultiModal Large Language Models

28 May 2024 | Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, Dong Yu
This paper presents a comprehensive survey of recent advancements in MultiModal Large Language Models (MM-LLMs). The authors summarize the design formulations and training pipelines of MM-LLMs, introduce a taxonomy of 126 state-of-the-art MM-LLMs, and review their performance on mainstream benchmarks. They also summarize key training recipes to enhance the effectiveness of MM-LLMs and explore promising directions for future research. The paper highlights the progress of MM-LLMs from focusing on MM understanding to generating specific modalities and evolving into any-to-any modality conversion. It also discusses the training pipeline, including MM Pre-Training (PT) and MM Instruction-Tuning (IT), and the importance of aligning with human intent. The authors also discuss future directions, including more general and intelligent models, more challenging benchmarks, lightweight deployment, embodied intelligence, continual learning, and mitigating hallucination and biases. The paper concludes that MM-LLMs have the potential to impact society positively but also pose risks that need to be addressed. The authors also acknowledge the limitations of their survey and emphasize the importance of ongoing research and development in this field.This paper presents a comprehensive survey of recent advancements in MultiModal Large Language Models (MM-LLMs). The authors summarize the design formulations and training pipelines of MM-LLMs, introduce a taxonomy of 126 state-of-the-art MM-LLMs, and review their performance on mainstream benchmarks. They also summarize key training recipes to enhance the effectiveness of MM-LLMs and explore promising directions for future research. The paper highlights the progress of MM-LLMs from focusing on MM understanding to generating specific modalities and evolving into any-to-any modality conversion. It also discusses the training pipeline, including MM Pre-Training (PT) and MM Instruction-Tuning (IT), and the importance of aligning with human intent. The authors also discuss future directions, including more general and intelligent models, more challenging benchmarks, lightweight deployment, embodied intelligence, continual learning, and mitigating hallucination and biases. The paper concludes that MM-LLMs have the potential to impact society positively but also pose risks that need to be addressed. The authors also acknowledge the limitations of their survey and emphasize the importance of ongoing research and development in this field.
Reach us at info@study.space