[slides] MM-LLMs%3A Recent Advances in MultiModal Large Language Models

The paper provides a comprehensive survey of MultiModal Large Language Models (MM-LLMs), which integrate Large Language Models (LLMs) with other modalities to enhance their capabilities in various multimodal tasks. The authors outline the general design formulations for model architecture and training pipelines, introduce a taxonomy of 126 MM-LLMs, review their performance on mainstream benchmarks, and summarize key training recipes. They also explore promising future directions, including more general and intelligent models, more challenging benchmarks, mobile/lightweight deployment, embodied intelligence, continual learning, mitigating hallucinations, and addressing biases and ethical considerations. The paper aims to facilitate further research and contribute to the ongoing advancements in the field of MM-LLMs.The paper provides a comprehensive survey of MultiModal Large Language Models (MM-LLMs), which integrate Large Language Models (LLMs) with other modalities to enhance their capabilities in various multimodal tasks. The authors outline the general design formulations for model architecture and training pipelines, introduce a taxonomy of 126 MM-LLMs, review their performance on mainstream benchmarks, and summarize key training recipes. They also explore promising future directions, including more general and intelligent models, more challenging benchmarks, mobile/lightweight deployment, embodied intelligence, continual learning, mitigating hallucinations, and addressing biases and ethical considerations. The paper aims to facilitate further research and contribute to the ongoing advancements in the field of MM-LLMs.

MM-LLMs: Recent Advances in MultiModal Large Language Models

28 May 2024 | Duzhen Zhang, Yahan Yu, Jiahua Dong, Chenxing Li, Dan Su, Chenhui Chu, Dong Yu