27 May 2016 | Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel
This paper presents a benchmark suite for continuous control tasks in reinforcement learning. The benchmark includes a variety of tasks, ranging from simple tasks like cart-pole balancing to complex tasks such as high-degree-of-freedom locomotion, partial observations, and hierarchical structures. The authors evaluate a range of reinforcement learning algorithms on these tasks to assess their effectiveness in training deep neural network policies. The benchmark and reference implementations are available at https://github.com/rllab/rllab to facilitate experimental reproducibility and encourage further research.
The benchmark includes 31 continuous control tasks, covering basic tasks, locomotion tasks, partially observable tasks, and hierarchical tasks. The tasks are implemented using physics simulators rather than symbolic equations, allowing for easier modification and more accurate modeling of complex dynamics. The benchmark includes tasks with simple dynamics using Box2D and more complex dynamics using MuJoCo.
The paper evaluates several reinforcement learning algorithms, including batch algorithms such as REINFORCE, Truncated Natural Policy Gradient (TNPG), Reward-Weighted Regression (RWR), Relative Entropy Policy Search (REPS), and Trust Region Policy Optimization (TRPO), as well as online algorithms like Deep Deterministic Policy Gradient (DDPG) and gradient-free methods such as Cross Entropy Method (CEM) and Covariance Matrix Adaption Evolution Strategy (CMA-ES). The results show that TNPG, TRPO, and DDPG are effective methods for training deep neural network policies, although they perform poorly on hierarchical tasks.
The authors also discuss the challenges of continuous control, including the need for efficient algorithms that can handle high-dimensional state and action spaces, and the importance of benchmarks in evaluating and improving reinforcement learning algorithms. The benchmark provides a standardized set of tasks for researchers to evaluate their algorithms and encourages further development of new methods for continuous control.This paper presents a benchmark suite for continuous control tasks in reinforcement learning. The benchmark includes a variety of tasks, ranging from simple tasks like cart-pole balancing to complex tasks such as high-degree-of-freedom locomotion, partial observations, and hierarchical structures. The authors evaluate a range of reinforcement learning algorithms on these tasks to assess their effectiveness in training deep neural network policies. The benchmark and reference implementations are available at https://github.com/rllab/rllab to facilitate experimental reproducibility and encourage further research.
The benchmark includes 31 continuous control tasks, covering basic tasks, locomotion tasks, partially observable tasks, and hierarchical tasks. The tasks are implemented using physics simulators rather than symbolic equations, allowing for easier modification and more accurate modeling of complex dynamics. The benchmark includes tasks with simple dynamics using Box2D and more complex dynamics using MuJoCo.
The paper evaluates several reinforcement learning algorithms, including batch algorithms such as REINFORCE, Truncated Natural Policy Gradient (TNPG), Reward-Weighted Regression (RWR), Relative Entropy Policy Search (REPS), and Trust Region Policy Optimization (TRPO), as well as online algorithms like Deep Deterministic Policy Gradient (DDPG) and gradient-free methods such as Cross Entropy Method (CEM) and Covariance Matrix Adaption Evolution Strategy (CMA-ES). The results show that TNPG, TRPO, and DDPG are effective methods for training deep neural network policies, although they perform poorly on hierarchical tasks.
The authors also discuss the challenges of continuous control, including the need for efficient algorithms that can handle high-dimensional state and action spaces, and the importance of benchmarks in evaluating and improving reinforcement learning algorithms. The benchmark provides a standardized set of tasks for researchers to evaluate their algorithms and encourages further development of new methods for continuous control.