RL-GPT: Integrating Reinforcement Learning and Code-as-policy

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

29 Feb 2024 | Shaoteng Liu, Haoqi Yuan, Minda Hu, Yanwei Li, Yukang Chen, Shu Liu, Zongqing Lu, Jiaya Jia
The paper introduces RL-GPT, a novel framework that integrates Large Language Models (LLMs) and Reinforcement Learning (RL) to enhance LLMs' capabilities in complex, embodied environments. RL-GPT consists of a slow agent and a fast agent, each designed to handle specific tasks within the framework. The slow agent decomposes tasks into sub-actions and determines which actions can be coded, while the fast agent writes the code and executes the tasks. This two-level hierarchical approach ensures efficient and effective learning, outperforming traditional RL methods and existing GPT agents. The framework is evaluated on the Minecraft game, where it demonstrates superior performance in obtaining diamonds within a single day using only an RTX3090 GPU. The paper also includes a detailed analysis of the framework's components, including the two-loop iteration mechanism and the RL interface, and presents ablation studies to validate the effectiveness of each component.The paper introduces RL-GPT, a novel framework that integrates Large Language Models (LLMs) and Reinforcement Learning (RL) to enhance LLMs' capabilities in complex, embodied environments. RL-GPT consists of a slow agent and a fast agent, each designed to handle specific tasks within the framework. The slow agent decomposes tasks into sub-actions and determines which actions can be coded, while the fast agent writes the code and executes the tasks. This two-level hierarchical approach ensures efficient and effective learning, outperforming traditional RL methods and existing GPT agents. The framework is evaluated on the Minecraft game, where it demonstrates superior performance in obtaining diamonds within a single day using only an RTX3090 GPU. The paper also includes a detailed analysis of the framework's components, including the two-loop iteration mechanism and the RL interface, and presents ablation studies to validate the effectiveness of each component.
Reach us at info@study.space