[slides and audio] A Survey of Resource-efficient LLM and Multimodal Foundation Models

This survey, authored by Mengwei Xu from Beijing University of Posts and Telecommunications, Peking University, and Tsinghua University, focuses on the critical importance of resource-efficient strategies for large foundation models (LLMs, ViTs, diffusion models, and multimodal models). These models, while offering significant advancements in versatility and performance, come at a substantial cost in terms of hardware resources. The survey aims to provide a comprehensive analysis of existing literature, covering a broad array of topics from model architectures and training algorithms to system designs and implementations. It highlights the need for scalable and environmentally sustainable approaches to support the growth of these large models. 1. **Introduction**: The survey begins with an overview of the rapidly evolving field of artificial intelligence, emphasizing the transition from specialized deep learning models to versatile foundation models. These models, such as LLMs, ViTs, LDMs, and multimodal models, are capable of handling a wide range of tasks with zero-shot abilities. 2. **Resource Efficiency**: The survey discusses the significant resource demands of these models, including computing processors, memory, energy, and network bandwidth. It highlights the challenges posed by the exponential growth in resource requirements and the need for efficient strategies. 3. **Model Architectures**: The survey delves into the key architectures of large foundation models, including Transformer-based models, speech models, and vision models. It provides detailed explanations of their components, such as embedding layers, attention mechanisms, and decoder architectures. 4. **Representative Models and Downstream Tasks**: The survey examines the performance and applications of various representative models, such as BERT, GPT, ViT, and multimodal models like CLIP and ImageBind. It also discusses the computational and storage costs associated with these models. 5. **Cost Analysis**: The survey analyzes the computational and storage costs of different components of large foundation models, focusing on the attention mechanism and fully connected layers. It provides quantitative insights into the resource consumption of these models. 6. **Resource-Efficient Architectures**: The survey explores efficient alternatives to traditional attention mechanisms, such as sparse attention, approximate attention, and attention-free approaches. It also discusses dynamic neural network architectures like Mixture of Experts (MoE) and early exiting techniques. 7. **System-Level Innovations**: The survey covers system-level optimizations, data management techniques, and novel architectures that reduce the resource footprint of large foundation models without compromising performance. It includes advancements in cloud, edge, and device implementations. - **Contact**: mxz@bupt.edu.cn - **Website**: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey The survey aims to provide a comprehensive understanding of current approaches to tackling the resource challenges posed by large foundation models and to inspire future breakthroughs in this field. It emphasizes the importance of both algorithmic and systemic innovations to make these models more resource-efficient and scalable.This survey, authored by Mengwei Xu from Beijing University of Posts and Telecommunications, Peking University, and Tsinghua University, focuses on the critical importance of resource-efficient strategies for large foundation models (LLMs, ViTs, diffusion models, and multimodal models). These models, while offering significant advancements in versatility and performance, come at a substantial cost in terms of hardware resources. The survey aims to provide a comprehensive analysis of existing literature, covering a broad array of topics from model architectures and training algorithms to system designs and implementations. It highlights the need for scalable and environmentally sustainable approaches to support the growth of these large models. 1. **Introduction**: The survey begins with an overview of the rapidly evolving field of artificial intelligence, emphasizing the transition from specialized deep learning models to versatile foundation models. These models, such as LLMs, ViTs, LDMs, and multimodal models, are capable of handling a wide range of tasks with zero-shot abilities. 2. **Resource Efficiency**: The survey discusses the significant resource demands of these models, including computing processors, memory, energy, and network bandwidth. It highlights the challenges posed by the exponential growth in resource requirements and the need for efficient strategies. 3. **Model Architectures**: The survey delves into the key architectures of large foundation models, including Transformer-based models, speech models, and vision models. It provides detailed explanations of their components, such as embedding layers, attention mechanisms, and decoder architectures. 4. **Representative Models and Downstream Tasks**: The survey examines the performance and applications of various representative models, such as BERT, GPT, ViT, and multimodal models like CLIP and ImageBind. It also discusses the computational and storage costs associated with these models. 5. **Cost Analysis**: The survey analyzes the computational and storage costs of different components of large foundation models, focusing on the attention mechanism and fully connected layers. It provides quantitative insights into the resource consumption of these models. 6. **Resource-Efficient Architectures**: The survey explores efficient alternatives to traditional attention mechanisms, such as sparse attention, approximate attention, and attention-free approaches. It also discusses dynamic neural network architectures like Mixture of Experts (MoE) and early exiting techniques. 7. **System-Level Innovations**: The survey covers system-level optimizations, data management techniques, and novel architectures that reduce the resource footprint of large foundation models without compromising performance. It includes advancements in cloud, edge, and device implementations. - **Contact**: mxz@bupt.edu.cn - **Website**: https://github.com/UbiquitousLearning/Efficient_Foundation_Model_Survey The survey aims to provide a comprehensive understanding of current approaches to tackling the resource challenges posed by large foundation models and to inspire future breakthroughs in this field. It emphasizes the importance of both algorithmic and systemic innovations to make these models more resource-efficient and scalable.

A SURVEY OF RESOURCE-EFFICIENT LLM AND MULTIMODAL FOUNDATION MODELS

16 Jan 2024 | Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu

A SURVEY OF RESOURCE-EFFICIENT LLM AND MULTIMODAL FOUNDATION MODELS

16 Jan 2024 | Mengwei Xu*, Wangsong Yin*, Dongqi Cai*, Rongjie Yi*, Daliang Xu*, Qipeng Wang*, Bingyang Wu*, Yihao Zhao*, Chen Yang*, Shihe Wang*, Qiyang Zhang*, Zhenyan Lu*, Li Zhang*, Shangguang Wang*, Yuanchun Li*, Yunxin Liu*, Xin Jin*, Xuanzhe Liu*

16 Jan 2024 | Mengwei Xu, Wangsong Yin, Dongqi Cai, Rongjie Yi, Daliang Xu, Qipeng Wang, Bingyang Wu, Yihao Zhao, Chen Yang, Shihe Wang, Qiyang Zhang, Zhenyan Lu, Li Zhang, Shangguang Wang, Yuanchun Li, Yunxin Liu, Xin Jin, Xuanzhe Liu