5 Jul 2019 | Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver & Daan Wierstra
This paper presents Deep Deterministic Policy Gradient (DDPG), a model-free, off-policy actor-critic algorithm that can learn policies in high-dimensional, continuous action spaces. The algorithm is based on the deterministic policy gradient (DPG) method and incorporates ideas from the Deep Q Network (DQN) algorithm to improve stability and performance. DDPG uses neural network function approximators to estimate action-value functions and learns policies directly from raw pixel inputs, making it suitable for complex physical control tasks.
The algorithm is tested on a variety of simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion, and car driving. It is able to find policies that are competitive with those found by planning algorithms with full access to the dynamics of the domain. The algorithm is also able to learn policies "end-to-end" from raw pixel inputs, without the need for manual feature extraction.
DDPG uses a replay buffer to store and sample transitions from the environment, which helps to reduce correlations between samples and improve stability. It also uses a target network to calculate target values, which helps to prevent divergence during learning. The algorithm also incorporates batch normalization to improve the stability of learning and to handle different scales of state values.
The algorithm is evaluated on a variety of physical control tasks, including tasks involving contacts, locomotion, and grasping. It is shown to be effective in learning policies for these tasks, even when using raw pixel inputs. The algorithm is also compared to a planning algorithm that has full access to the dynamics of the domain, and it is shown to be able to find policies that sometimes exceed the performance of the planner.
The paper also discusses related work in the field of reinforcement learning, including other model-free policy search methods and model-based approaches. It concludes that DDPG is a promising approach for scaling reinforcement learning to large, high-dimensional domains.This paper presents Deep Deterministic Policy Gradient (DDPG), a model-free, off-policy actor-critic algorithm that can learn policies in high-dimensional, continuous action spaces. The algorithm is based on the deterministic policy gradient (DPG) method and incorporates ideas from the Deep Q Network (DQN) algorithm to improve stability and performance. DDPG uses neural network function approximators to estimate action-value functions and learns policies directly from raw pixel inputs, making it suitable for complex physical control tasks.
The algorithm is tested on a variety of simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion, and car driving. It is able to find policies that are competitive with those found by planning algorithms with full access to the dynamics of the domain. The algorithm is also able to learn policies "end-to-end" from raw pixel inputs, without the need for manual feature extraction.
DDPG uses a replay buffer to store and sample transitions from the environment, which helps to reduce correlations between samples and improve stability. It also uses a target network to calculate target values, which helps to prevent divergence during learning. The algorithm also incorporates batch normalization to improve the stability of learning and to handle different scales of state values.
The algorithm is evaluated on a variety of physical control tasks, including tasks involving contacts, locomotion, and grasping. It is shown to be effective in learning policies for these tasks, even when using raw pixel inputs. The algorithm is also compared to a planning algorithm that has full access to the dynamics of the domain, and it is shown to be able to find policies that sometimes exceed the performance of the planner.
The paper also discusses related work in the field of reinforcement learning, including other model-free policy search methods and model-based approaches. It concludes that DDPG is a promising approach for scaling reinforcement learning to large, high-dimensional domains.