23 Nov 2016 | Shixiang Gu*,1,2,3 and Ethan Holly*,1 and Timothy Lillicrap4 and Sergey Levine1,5
This paper presents a deep reinforcement learning (DRL) approach for robotic manipulation using asynchronous off-policy updates. The method enables real robots to learn complex 3D manipulation skills without prior demonstrations or task-specific representations. The key contribution is an asynchronous variant of the Normalized Advantage Functions (NAF) algorithm, which allows multiple robots to learn a shared policy in parallel. The algorithm uses a replay buffer and asynchronous updates across multiple robots, significantly reducing training time. The method is evaluated on both simulated and real-world tasks, including door opening, where it achieves 100% success rate without human demonstrations. The paper demonstrates that the algorithm can learn complex policies efficiently, with training times reduced by parallelizing across multiple robots. The approach uses deep neural networks for policy and value functions, which outperform linear models in terms of convergence speed and performance on complex tasks. The method is robust to different robotic environments and has potential for real-world applications in autonomous robotics. The results show that asynchronous training with multiple robots significantly improves learning speed and policy performance, making it a promising approach for real-world robotic manipulation.This paper presents a deep reinforcement learning (DRL) approach for robotic manipulation using asynchronous off-policy updates. The method enables real robots to learn complex 3D manipulation skills without prior demonstrations or task-specific representations. The key contribution is an asynchronous variant of the Normalized Advantage Functions (NAF) algorithm, which allows multiple robots to learn a shared policy in parallel. The algorithm uses a replay buffer and asynchronous updates across multiple robots, significantly reducing training time. The method is evaluated on both simulated and real-world tasks, including door opening, where it achieves 100% success rate without human demonstrations. The paper demonstrates that the algorithm can learn complex policies efficiently, with training times reduced by parallelizing across multiple robots. The approach uses deep neural networks for policy and value functions, which outperform linear models in terms of convergence speed and performance on complex tasks. The method is robust to different robotic environments and has potential for real-world applications in autonomous robotics. The results show that asynchronous training with multiple robots significantly improves learning speed and policy performance, making it a promising approach for real-world robotic manipulation.