[slides and audio] From Large Language Models to Large Multimodal Models%3A A Literature Review

This paper provides a comprehensive review of the evolution from Large Language Models (LLMs) to Large Multimodal Models (LMMs), aiming to summarize recent progress and offer a unified perspective on both types of models. The authors begin by outlining the key techniques and conceptual frameworks of LLMs, including architectural designs, pretraining strategies, fine-tuning methods, and prompt engineering. They then delve into the architectural components, training strategies, instruction tuning, and prompt engineering of LMMs, focusing on the vision-language domain. A taxonomy of 66 cutting-edge vision-language LMMs is presented, detailing their components and performance. Finally, the paper offers a unified analysis of both LLMs and LMMs, discussing the development status of large-scale models globally and suggesting potential future research directions. The review highlights the connections between LLMs and LMMs, emphasizing the importance of a unified perspective in understanding the expansion from LLMs to LMMs.This paper provides a comprehensive review of the evolution from Large Language Models (LLMs) to Large Multimodal Models (LMMs), aiming to summarize recent progress and offer a unified perspective on both types of models. The authors begin by outlining the key techniques and conceptual frameworks of LLMs, including architectural designs, pretraining strategies, fine-tuning methods, and prompt engineering. They then delve into the architectural components, training strategies, instruction tuning, and prompt engineering of LMMs, focusing on the vision-language domain. A taxonomy of 66 cutting-edge vision-language LMMs is presented, detailing their components and performance. Finally, the paper offers a unified analysis of both LLMs and LMMs, discussing the development status of large-scale models globally and suggesting potential future research directions. The review highlights the connections between LLMs and LMMs, emphasizing the importance of a unified perspective in understanding the expansion from LLMs to LMMs.

From Large Language Models to Large Multimodal Models: A Literature Review

2024 | Dawei Huang, Chuan Yan, Qing Li, Xiaojiang Peng