DrEureka: Language Model Guided Sim-To-Real Transfer

DrEureka: Language Model Guided Sim-To-Real Transfer

4 Jun 2024 | Yecheng Jason Ma*1, William Liang*1, Hung-Ju Wang1, Sam Wang1, Yuke Zhu2,3, Linxi "Jim" Fan2, Osbert Bastani1, Dinesh Jayaraman1
**DrEureka: Language Model Guided Sim-To-Real Transfer** **Abstract:** Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function and simulation physics parameters, which is slow and labor-intensive. This paper investigates using Large Language Models (LLMs) to automate and accelerate sim-to-real design. DrEureka requires only the physics simulation for the target task and automatically constructs suitable reward functions and domain randomization distributions to support real-world transfer. The approach is demonstrated to discover sim-to-real configurations competitive with human-designed ones on quadruped locomotion and dexterous manipulation tasks. It also solves novel robot tasks, such as quadruped balancing and walking on a yoga ball, without iterative manual design. **Introduction:** The paper introduces DrEureka, an LLM-guided sim-to-real algorithm that automates reward design and domain randomization for sim-to-real transfer. DrEureka decomposes the optimization into three stages: synthesizing reward functions, constructing reward-aware physics priors, and generating domain randomization configurations. The method is evaluated on quadruped and dexterous manipulator platforms, showing general applicability and effectiveness in diverse robots and tasks. **Related Work:** The paper reviews existing work on large language models for robotics, domain randomization, and sim-to-real robot learning, highlighting the need for automated design in sim-to-real transfer. **Problem Setting:** The sim-to-real design problem is formalized, focusing on reward design and domain randomization. The goal is to train a policy in simulation and transfer it to the real world without further training. **Method:** DrEureka uses Eureka, a state-of-the-art LLM-based reward design algorithm, to generate reward functions. Safety instructions are included to ensure stable and safe behavior. A reward-aware physics prior (RAPP) is constructed to guide domain randomization, and the LLM generates domain randomization configurations based on this prior. **Experimental Setup:** The evaluation platform includes commercially available, low-cost robots with well-supported open-sourced simulators. The tasks are quadrupedal locomotion and dexterous manipulation. The methods and experimental setup are detailed, including the use of GPT-4 as the LLM backbone. **Results and Analysis:** DrEureka is compared to human-designed configurations and ablated versions. The results show that DrEureka outperforms human-designed configurations in both forward velocity and distance traveled on the track. For dexterous manipulation, DrEureka's best policy performs nearly 300% more in-hand cube rotations than the human-developed policy. DrEureka also successfully performs the challenging task of walking on a yoga ball, demonstrating its ability to handle novel and complex tasks. **Conclusion:** DrEureka demonstrates the potential of using LLMs to automate**DrEureka: Language Model Guided Sim-To-Real Transfer** **Abstract:** Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function and simulation physics parameters, which is slow and labor-intensive. This paper investigates using Large Language Models (LLMs) to automate and accelerate sim-to-real design. DrEureka requires only the physics simulation for the target task and automatically constructs suitable reward functions and domain randomization distributions to support real-world transfer. The approach is demonstrated to discover sim-to-real configurations competitive with human-designed ones on quadruped locomotion and dexterous manipulation tasks. It also solves novel robot tasks, such as quadruped balancing and walking on a yoga ball, without iterative manual design. **Introduction:** The paper introduces DrEureka, an LLM-guided sim-to-real algorithm that automates reward design and domain randomization for sim-to-real transfer. DrEureka decomposes the optimization into three stages: synthesizing reward functions, constructing reward-aware physics priors, and generating domain randomization configurations. The method is evaluated on quadruped and dexterous manipulator platforms, showing general applicability and effectiveness in diverse robots and tasks. **Related Work:** The paper reviews existing work on large language models for robotics, domain randomization, and sim-to-real robot learning, highlighting the need for automated design in sim-to-real transfer. **Problem Setting:** The sim-to-real design problem is formalized, focusing on reward design and domain randomization. The goal is to train a policy in simulation and transfer it to the real world without further training. **Method:** DrEureka uses Eureka, a state-of-the-art LLM-based reward design algorithm, to generate reward functions. Safety instructions are included to ensure stable and safe behavior. A reward-aware physics prior (RAPP) is constructed to guide domain randomization, and the LLM generates domain randomization configurations based on this prior. **Experimental Setup:** The evaluation platform includes commercially available, low-cost robots with well-supported open-sourced simulators. The tasks are quadrupedal locomotion and dexterous manipulation. The methods and experimental setup are detailed, including the use of GPT-4 as the LLM backbone. **Results and Analysis:** DrEureka is compared to human-designed configurations and ablated versions. The results show that DrEureka outperforms human-designed configurations in both forward velocity and distance traveled on the track. For dexterous manipulation, DrEureka's best policy performs nearly 300% more in-hand cube rotations than the human-developed policy. DrEureka also successfully performs the challenging task of walking on a yoga ball, demonstrating its ability to handle novel and complex tasks. **Conclusion:** DrEureka demonstrates the potential of using LLMs to automate
Reach us at info@study.space