This survey provides a comprehensive overview of resource-efficient large foundation models (LLMs) and multimodal foundation models. The rapid development of AI has led to the emergence of versatile foundation models that can perform a wide range of tasks, from language to vision and multimodal understanding. However, these models require significant computational resources for training and deployment, posing challenges in scalability and sustainability. The survey explores the importance of resource-efficient strategies in the development of these models, examining both algorithmic and systemic aspects. It analyzes existing literature, covering topics such as model architectures, training algorithms, system designs, and implementations. The goal is to provide an understanding of how current approaches address the resource challenges of large foundation models and to inspire future breakthroughs in this field.
The survey focuses on the key components of foundation models, including language, vision, and multimodal models. It discusses the architectures of these models, their representative models, and the downstream tasks they are used for. The survey also analyzes the computational and storage costs associated with these models, highlighting the challenges in training and inference. Additionally, it explores the cost analysis of different modules in multimodal foundation models, such as the multi-encoder module, decoder module, and FD module.
The survey also delves into resource-efficient architectures, including efficient attention mechanisms, dynamic neural networks, and diffusion-specific optimizations. These approaches aim to reduce computational and memory costs while maintaining model performance. The survey highlights various techniques for achieving efficient attention, such as sparse attention, approximate attention, and attention-free approaches. It also discusses dynamic neural networks, including mixture-of-experts (MoE) and state space models (SSMs), which offer efficient and scalable solutions for large foundation models.
Overall, the survey provides a comprehensive understanding of the current state and future directions of resource-efficient algorithms and systems in the realm of foundation models. It emphasizes the importance of developing efficient and sustainable approaches to address the growing resource demands of large foundation models.This survey provides a comprehensive overview of resource-efficient large foundation models (LLMs) and multimodal foundation models. The rapid development of AI has led to the emergence of versatile foundation models that can perform a wide range of tasks, from language to vision and multimodal understanding. However, these models require significant computational resources for training and deployment, posing challenges in scalability and sustainability. The survey explores the importance of resource-efficient strategies in the development of these models, examining both algorithmic and systemic aspects. It analyzes existing literature, covering topics such as model architectures, training algorithms, system designs, and implementations. The goal is to provide an understanding of how current approaches address the resource challenges of large foundation models and to inspire future breakthroughs in this field.
The survey focuses on the key components of foundation models, including language, vision, and multimodal models. It discusses the architectures of these models, their representative models, and the downstream tasks they are used for. The survey also analyzes the computational and storage costs associated with these models, highlighting the challenges in training and inference. Additionally, it explores the cost analysis of different modules in multimodal foundation models, such as the multi-encoder module, decoder module, and FD module.
The survey also delves into resource-efficient architectures, including efficient attention mechanisms, dynamic neural networks, and diffusion-specific optimizations. These approaches aim to reduce computational and memory costs while maintaining model performance. The survey highlights various techniques for achieving efficient attention, such as sparse attention, approximate attention, and attention-free approaches. It also discusses dynamic neural networks, including mixture-of-experts (MoE) and state space models (SSMs), which offer efficient and scalable solutions for large foundation models.
Overall, the survey provides a comprehensive understanding of the current state and future directions of resource-efficient algorithms and systems in the realm of foundation models. It emphasizes the importance of developing efficient and sustainable approaches to address the growing resource demands of large foundation models.