28 Nov 2018 | Dmitry Kalashnikov, Alex Irpan, Peter Pastor, Julian Ibarz, Alexander Herzog, Eric Jang, Deirdre Quillen, Ethan Holly, Mrinal Kalakrishnan, Vincent Vanhoucke, Sergey Levine
This paper presents QT-Opt, a scalable self-supervised vision-based reinforcement learning framework for robotic grasping. The method enables closed-loop vision-based control, allowing the robot to continuously update its grasp strategy based on recent observations to optimize long-horizon grasp success. QT-Opt leverages over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters, achieving a 96% grasp success rate on unseen objects. The system exhibits behaviors distinct from standard grasping systems, including automatic regrasping, probing, non-prehensile pre-grasp manipulations, and dynamic response to disturbances. The paper discusses the design of QT-Opt, its implementation, and experimental results, demonstrating its effectiveness in both quantitative and qualitative terms. The method is shown to outperform prior work in terms of grasp success rate and generalization to new objects, while also performing complex pre-grasp manipulations and handling dynamic disturbances.This paper presents QT-Opt, a scalable self-supervised vision-based reinforcement learning framework for robotic grasping. The method enables closed-loop vision-based control, allowing the robot to continuously update its grasp strategy based on recent observations to optimize long-horizon grasp success. QT-Opt leverages over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters, achieving a 96% grasp success rate on unseen objects. The system exhibits behaviors distinct from standard grasping systems, including automatic regrasping, probing, non-prehensile pre-grasp manipulations, and dynamic response to disturbances. The paper discusses the design of QT-Opt, its implementation, and experimental results, demonstrating its effectiveness in both quantitative and qualitative terms. The method is shown to outperform prior work in terms of grasp success rate and generalization to new objects, while also performing complex pre-grasp manipulations and handling dynamic disturbances.