February 9, 2024 | Kento Kawaharazuka, Tatsuya Matsushima, Andrew Gambardella, Jiaxian Guo, Chris Paxton, Andy Zeng
This review article explores the practical applications of foundation models in real-world robotics, focusing on their role in replacing specific components within existing robot systems. Foundation models, such as Large Language Models (LLMs) and Vision-Language Models (VLMs), are trained on extensive data and can be applied to a wide range of tasks through in-context learning, fine-tuning, or zero-shot learning. The paper categorizes the utilization of foundation models in robotics into five main categories: low-level perception, high-level perception, high-level planning, low-level planning, and data augmentation. It provides detailed examples and case studies for each category, highlighting how foundation models can enhance robot capabilities in areas such as feature extraction, scene recognition, goal generation, map construction, and motion planning. The article also discusses the development of robotic foundation models, which integrate perception, planning, and control for more sophisticated applications. Finally, it addresses future challenges and implications for practical robot applications.This review article explores the practical applications of foundation models in real-world robotics, focusing on their role in replacing specific components within existing robot systems. Foundation models, such as Large Language Models (LLMs) and Vision-Language Models (VLMs), are trained on extensive data and can be applied to a wide range of tasks through in-context learning, fine-tuning, or zero-shot learning. The paper categorizes the utilization of foundation models in robotics into five main categories: low-level perception, high-level perception, high-level planning, low-level planning, and data augmentation. It provides detailed examples and case studies for each category, highlighting how foundation models can enhance robot capabilities in areas such as feature extraction, scene recognition, goal generation, map construction, and motion planning. The article also discusses the development of robotic foundation models, which integrate perception, planning, and control for more sophisticated applications. Finally, it addresses future challenges and implications for practical robot applications.