28 Nov 2018 | Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, Sergey Levine
QT-Opt is a scalable deep reinforcement learning framework for vision-based robotic manipulation, specifically designed for grasping tasks. The method enables closed-loop vision-based control, allowing robots to continuously update their grasp strategies based on real-time observations to optimize long-horizon grasp success. QT-Opt leverages over 580,000 real-world grasp attempts to train a deep neural network Q-function with over 1.2 million parameters, achieving 96% grasp success on unseen objects. The method exhibits behaviors distinct from standard grasping systems, including regrasping, object probing, repositioning, and dynamic responses to disturbances. It uses only RGB vision from an over-the-shoulder camera and operates on raw monocular RGB observations, demonstrating effective learning with minimal sensing. The framework is based on a Markov Decision Process (MDP) and uses a continuous-action generalization of Q-learning, with a distributed, asynchronous implementation for scalability. QT-Opt avoids training an explicit actor by using stochastic optimization over the critic to select actions and target values. The method is evaluated on real-world robotic systems, showing high success rates and effective generalization to new objects. The results demonstrate that QT-Opt can learn complex grasping strategies, including pre-grasp manipulations and dynamic responses to disturbances, through self-supervised data collection and off-policy training. The framework is generic and can be applied to various robotic manipulation tasks, with potential for future work in extending the approach to other skills.QT-Opt is a scalable deep reinforcement learning framework for vision-based robotic manipulation, specifically designed for grasping tasks. The method enables closed-loop vision-based control, allowing robots to continuously update their grasp strategies based on real-time observations to optimize long-horizon grasp success. QT-Opt leverages over 580,000 real-world grasp attempts to train a deep neural network Q-function with over 1.2 million parameters, achieving 96% grasp success on unseen objects. The method exhibits behaviors distinct from standard grasping systems, including regrasping, object probing, repositioning, and dynamic responses to disturbances. It uses only RGB vision from an over-the-shoulder camera and operates on raw monocular RGB observations, demonstrating effective learning with minimal sensing. The framework is based on a Markov Decision Process (MDP) and uses a continuous-action generalization of Q-learning, with a distributed, asynchronous implementation for scalability. QT-Opt avoids training an explicit actor by using stochastic optimization over the critic to select actions and target values. The method is evaluated on real-world robotic systems, showing high success rates and effective generalization to new objects. The results demonstrate that QT-Opt can learn complex grasping strategies, including pre-grasp manipulations and dynamic responses to disturbances, through self-supervised data collection and off-policy training. The framework is generic and can be applied to various robotic manipulation tasks, with potential for future work in extending the approach to other skills.