Aug. 2024 | Haokun Liu, Yaonan Zhu, Kenji Kato, Atsushi Tsukahara, Izumi Kondo, Tadayoshi Aoyama, and Yasuhisa Hasegawa
This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC). The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot. The system also employs a YOLO-based perception algorithm to provide visual cues to the LLM, aiding in planning feasible motions within the specific environment. Additionally, an HRC method is proposed by combining teleoperation and Dynamic Movement Primitives (DMP), allowing the LLM-based robot to learn from human guidance. Real-world experiments have been conducted using the Toyota Human Support Robot for manipulation tasks. The outcomes indicate that tasks requiring complex trajectory planning and reasoning over environments can be efficiently accomplished through the incorporation of human demonstrations.
The proposed system integrates an LLM with environmental information within the Robot Operating System (ROS) to construct an LLM-based autonomous system. To enhance the capabilities of the LLM-based system in executing complex tasks, an HRC method is adopted to guide robot motion with human demonstration. This integration enables the translation of human commands into specific robotic motions. The system utilizes two primary libraries for motion execution: the basic library and the DMP library. The basic library includes pre-programmed motion functions, while the DMP library stores updated motion function sequences for sub-tasks.
The LLM is used to select motion functions from the basic library based on natural language commands and integrates these functions with environmental information to generate Pythonic code. A hierarchical planning framework is also employed, allowing the LLM to decompose complex tasks into sub-tasks and execute each motion function sequentially. The system also incorporates a teleoperation-based HRC framework for motion demonstration, enabling the LLM-based robot to learn from human demonstrations.
The system was tested on the Toyota Human Support Robot, demonstrating an average success rate of 79.5%, with 99.4% executability and 97.5% feasibility across various tasks. The results show that the system can effectively translate language commands into robot motions and integrate operator instructions to accomplish unachievable tasks, making significant strides toward improving the performance of LLM-based robots in real-world environments. However, future research will focus on integrating LIDAR-derived point clouds and tactile sensing technologies to enhance the proposed LLM-based robot performance in real-world environments.This paper proposes a novel approach to enhance the performance of LLM-based autonomous manipulation through Human-Robot Collaboration (HRC). The approach involves using a prompted GPT-4 language model to decompose high-level language commands into sequences of motions that can be executed by the robot. The system also employs a YOLO-based perception algorithm to provide visual cues to the LLM, aiding in planning feasible motions within the specific environment. Additionally, an HRC method is proposed by combining teleoperation and Dynamic Movement Primitives (DMP), allowing the LLM-based robot to learn from human guidance. Real-world experiments have been conducted using the Toyota Human Support Robot for manipulation tasks. The outcomes indicate that tasks requiring complex trajectory planning and reasoning over environments can be efficiently accomplished through the incorporation of human demonstrations.
The proposed system integrates an LLM with environmental information within the Robot Operating System (ROS) to construct an LLM-based autonomous system. To enhance the capabilities of the LLM-based system in executing complex tasks, an HRC method is adopted to guide robot motion with human demonstration. This integration enables the translation of human commands into specific robotic motions. The system utilizes two primary libraries for motion execution: the basic library and the DMP library. The basic library includes pre-programmed motion functions, while the DMP library stores updated motion function sequences for sub-tasks.
The LLM is used to select motion functions from the basic library based on natural language commands and integrates these functions with environmental information to generate Pythonic code. A hierarchical planning framework is also employed, allowing the LLM to decompose complex tasks into sub-tasks and execute each motion function sequentially. The system also incorporates a teleoperation-based HRC framework for motion demonstration, enabling the LLM-based robot to learn from human demonstrations.
The system was tested on the Toyota Human Support Robot, demonstrating an average success rate of 79.5%, with 99.4% executability and 97.5% feasibility across various tasks. The results show that the system can effectively translate language commands into robot motions and integrate operator instructions to accomplish unachievable tasks, making significant strides toward improving the performance of LLM-based robots in real-world environments. However, future research will focus on integrating LIDAR-derived point clouds and tactile sensing technologies to enhance the proposed LLM-based robot performance in real-world environments.