10 Jul 2017 | Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver
This paper explores how complex behaviors can emerge in reinforcement learning when agents are trained in rich and diverse environments. The authors investigate whether simple reward functions can lead to the development of sophisticated locomotion skills, such as jumping, crouching, and turning, without the need for careful reward design or explicit demonstration data. They train agents on a variety of challenging terrains and obstacles using a simple reward function based on forward progress. The agents learn to perform these behaviors robustly across different tasks, demonstrating the effectiveness of their approach. The paper also introduces a scalable variant of policy gradient reinforcement learning, called Distributed Proximal Policy Optimization (DPPO), which improves upon existing algorithms in terms of both performance and scalability. The results show that training on diverse terrains and obstacles can enhance the learning process and result in more robust and effective behaviors.This paper explores how complex behaviors can emerge in reinforcement learning when agents are trained in rich and diverse environments. The authors investigate whether simple reward functions can lead to the development of sophisticated locomotion skills, such as jumping, crouching, and turning, without the need for careful reward design or explicit demonstration data. They train agents on a variety of challenging terrains and obstacles using a simple reward function based on forward progress. The agents learn to perform these behaviors robustly across different tasks, demonstrating the effectiveness of their approach. The paper also introduces a scalable variant of policy gradient reinforcement learning, called Distributed Proximal Policy Optimization (DPPO), which improves upon existing algorithms in terms of both performance and scalability. The results show that training on diverse terrains and obstacles can enhance the learning process and result in more robust and effective behaviors.