Efficient Multimodal Large Language Models: A Survey

Efficient Multimodal Large Language Models: A Survey

9 Aug 2024 | Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma
This survey provides a comprehensive review of the current state of efficient Multimodal Large Language Models (MLLMs), focusing on their architecture, efficient vision, efficient LLMs, training, data and benchmarks, and applications. The survey highlights the challenges of deploying large-scale MLLMs due to high computational and memory costs, and explores various strategies to make them more efficient, including lightweight models, efficient structures, and optimization techniques. The survey also discusses the limitations of current research and outlines future directions for efficient MLLMs. The survey is accompanied by a GitHub repository where the papers featured are organized according to the same taxonomy. The survey is structured into several sections, including an introduction, architecture, efficient vision, efficient LLMs, training, data and benchmarks, and applications. The survey provides an in-depth analysis of the development of efficient MLLMs, their key components, and their potential applications in various domains. The survey aims to provide a comprehensive understanding of the current state-of-the-art in efficient MLLMs and to serve as a roadmap for future research in this field.This survey provides a comprehensive review of the current state of efficient Multimodal Large Language Models (MLLMs), focusing on their architecture, efficient vision, efficient LLMs, training, data and benchmarks, and applications. The survey highlights the challenges of deploying large-scale MLLMs due to high computational and memory costs, and explores various strategies to make them more efficient, including lightweight models, efficient structures, and optimization techniques. The survey also discusses the limitations of current research and outlines future directions for efficient MLLMs. The survey is accompanied by a GitHub repository where the papers featured are organized according to the same taxonomy. The survey is structured into several sections, including an introduction, architecture, efficient vision, efficient LLMs, training, data and benchmarks, and applications. The survey provides an in-depth analysis of the development of efficient MLLMs, their key components, and their potential applications in various domains. The survey aims to provide a comprehensive understanding of the current state-of-the-art in efficient MLLMs and to serve as a roadmap for future research in this field.
Reach us at info@study.space
Understanding Efficient Multimodal Large Language Models%3A A Survey