26 Jun 2018 | Aravind Rajeswaran1*, Vikash Kumar1,2*, Abhishek Gupta3, Giulia Vezzani1, John Schulman2, Emanuel Todorov1, Sergey Levine3
This paper presents a method for learning complex dexterous manipulation tasks using deep reinforcement learning (DRL) combined with human demonstrations. The goal is to enable a 24-degree-of-freedom (DoF) robotic hand to perform tasks such as object relocation, in-hand manipulation, tool use, and door opening. The method, called Demonstration Augmented Policy Gradient (DAPG), integrates human demonstrations into the policy gradient learning process to reduce sample complexity and improve policy robustness.
The paper first introduces four dexterous manipulation tasks that are representative of real-world scenarios. These tasks involve high-dimensional state and action spaces, complex contact dynamics, and require precise control of the robotic hand. The tasks are simulated using the MuJoCo physics engine, and the robotic hand is modeled as a 24-DoF anthropomorphic platform.
The paper then describes the experimental setup, including the ADROIT hand and the simulation environment. It discusses the challenges of learning dexterous manipulation tasks using DRL, including the need for extensive sample data and the difficulty of achieving robust policies. The paper also compares DRL methods with and without demonstrations, showing that DAPG significantly reduces sample complexity and improves policy robustness.
The paper presents results showing that DAPG outperforms other DRL methods, including DDPGfD, in terms of sample efficiency and policy robustness. It also demonstrates that policies learned with demonstrations are more human-like and robust to environmental variations. The paper concludes that DAPG is a viable approach for real-world learning of complex dexterous manipulation tasks, and that further research is needed to apply this method to physical hardware systems.This paper presents a method for learning complex dexterous manipulation tasks using deep reinforcement learning (DRL) combined with human demonstrations. The goal is to enable a 24-degree-of-freedom (DoF) robotic hand to perform tasks such as object relocation, in-hand manipulation, tool use, and door opening. The method, called Demonstration Augmented Policy Gradient (DAPG), integrates human demonstrations into the policy gradient learning process to reduce sample complexity and improve policy robustness.
The paper first introduces four dexterous manipulation tasks that are representative of real-world scenarios. These tasks involve high-dimensional state and action spaces, complex contact dynamics, and require precise control of the robotic hand. The tasks are simulated using the MuJoCo physics engine, and the robotic hand is modeled as a 24-DoF anthropomorphic platform.
The paper then describes the experimental setup, including the ADROIT hand and the simulation environment. It discusses the challenges of learning dexterous manipulation tasks using DRL, including the need for extensive sample data and the difficulty of achieving robust policies. The paper also compares DRL methods with and without demonstrations, showing that DAPG significantly reduces sample complexity and improves policy robustness.
The paper presents results showing that DAPG outperforms other DRL methods, including DDPGfD, in terms of sample efficiency and policy robustness. It also demonstrates that policies learned with demonstrations are more human-like and robust to environmental variations. The paper concludes that DAPG is a viable approach for real-world learning of complex dexterous manipulation tasks, and that further research is needed to apply this method to physical hardware systems.