[slides and audio] Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates

This paper presents an asynchronous deep reinforcement learning (DRL) approach for robotic manipulation tasks, focusing on complex 3D manipulation skills. The authors demonstrate that a recent DRL algorithm based on off-policy training of deep Q-functions can scale to real-world robotic systems, achieving efficient training on physical robots without the need for hand-engineered policy representations or human demonstrations. The key contributions include: 1. **Asynchronous Training of Normalized Advantage Functions (NAF)**: The paper introduces an asynchronous variant of the NAF algorithm, which allows multiple robots to collaboratively learn a single policy, significantly reducing training times by parallelizing the learning process across multiple robots. 2. **Efficient Training on Real Robots**: The method is shown to be effective in learning complex manipulation skills, such as door opening, in both simulated and real-world environments. The experiments demonstrate that the approach can learn these tasks from scratch, without any prior demonstrations or manually designed representations. 3. **Safety and Exploration**: The paper also addresses the challenge of safe exploration during training, using a bounding sphere and joint position limits to prevent unsafe actions. Additionally, it introduces a safety mechanism to constrain exploration at training time. 4. **Network Architectures**: The authors compare the performance of deep neural network policies (DNNs) with simpler linear models, showing that DNNs are more effective for complex tasks, even though they have more parameters. 5. **Simulated and Real-World Experiments**: The paper includes detailed simulations and real-world experiments on a 7-DoF arm and a Kinova JACO arm, demonstrating the effectiveness of the proposed method in various tasks, including reaching, door pushing, door pulling, and pick-and-place. 6. **Discussion and Future Work**: The authors discuss the limitations of their approach, such as the need for some reward guidance, and suggest future directions, including improving exploration and integrating diverse experience from multiple robots. Overall, the paper provides a comprehensive demonstration of the feasibility and effectiveness of asynchronous DRL for complex robotic manipulation tasks, highlighting the potential for autonomous learning in real-world robotic systems.This paper presents an asynchronous deep reinforcement learning (DRL) approach for robotic manipulation tasks, focusing on complex 3D manipulation skills. The authors demonstrate that a recent DRL algorithm based on off-policy training of deep Q-functions can scale to real-world robotic systems, achieving efficient training on physical robots without the need for hand-engineered policy representations or human demonstrations. The key contributions include: 1. **Asynchronous Training of Normalized Advantage Functions (NAF)**: The paper introduces an asynchronous variant of the NAF algorithm, which allows multiple robots to collaboratively learn a single policy, significantly reducing training times by parallelizing the learning process across multiple robots. 2. **Efficient Training on Real Robots**: The method is shown to be effective in learning complex manipulation skills, such as door opening, in both simulated and real-world environments. The experiments demonstrate that the approach can learn these tasks from scratch, without any prior demonstrations or manually designed representations. 3. **Safety and Exploration**: The paper also addresses the challenge of safe exploration during training, using a bounding sphere and joint position limits to prevent unsafe actions. Additionally, it introduces a safety mechanism to constrain exploration at training time. 4. **Network Architectures**: The authors compare the performance of deep neural network policies (DNNs) with simpler linear models, showing that DNNs are more effective for complex tasks, even though they have more parameters. 5. **Simulated and Real-World Experiments**: The paper includes detailed simulations and real-world experiments on a 7-DoF arm and a Kinova JACO arm, demonstrating the effectiveness of the proposed method in various tasks, including reaching, door pushing, door pulling, and pick-and-place. 6. **Discussion and Future Work**: The authors discuss the limitations of their approach, such as the need for some reward guidance, and suggest future directions, including improving exploration and integrating diverse experience from multiple robots. Overall, the paper provides a comprehensive demonstration of the feasibility and effectiveness of asynchronous DRL for complex robotic manipulation tasks, highlighting the potential for autonomous learning in real-world robotic systems.

Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

23 Nov 2016 | Shixiang Gu,1,2,3 and Ethan Holly,1 and Timothy Lillicrap4 and Sergey Levine1,5

Deep Reinforcement Learning for Robotic Manipulation with Asynchronous Off-Policy Updates

23 Nov 2016 | Shixiang Gu*,1,2,3 and Ethan Holly*,1 and Timothy Lillicrap4 and Sergey Levine1,5

23 Nov 2016 | Shixiang Gu,1,2,3 and Ethan Holly,1 and Timothy Lillicrap4 and Sergey Levine1,5