[slides] End-to-End Training of Deep Visuomotor Policies

This paper presents a method for end-to-end training of deep visuomotor policies for robotic control. The approach combines perception and control in a single policy, which is represented by a deep convolutional neural network (CNN) with 92,000 parameters. The policy is trained using a guided policy search method that transforms policy search into supervised learning, with supervision provided by a trajectory-centric reinforcement learning method. The method is evaluated on a range of real-world manipulation tasks that require close coordination between vision and control, such as screwing a cap onto a bottle, and simulated comparisons are made to prior policy search methods. The key contributions of this work include the development of a guided policy search algorithm for sensorimotor deep learning, a novel CNN architecture designed for robotic control, and the demonstration that end-to-end training of visuomotor policies leads to improved performance compared to training the vision and control components separately. The method is sample-efficient, requiring only minutes of interaction time, and is capable of training deep visuomotor policies for complex, high-dimensional manipulation skills with direct torque control. The approach uses a CNN architecture that automatically learns feature points that capture spatial information about the scene, without any supervision beyond the information from the robot's encoders and camera. The policy is trained to predict actions based on raw observations, rather than the full state of the system, allowing it to handle novel, unknown configurations without requiring full state information. The method also includes a pretraining scheme that allows the policy to be trained with a relatively small number of iterations, and it uses a novel spatial softmax layer to convert pixel-wise features to spatial coordinates, which are then used to compute motor torques. The paper also discusses the use of guided policy search with BADMM (Bregman ADMM) for constrained optimization, and presents a detailed derivation of the algorithm. The method is shown to be effective in learning complex robotic tasks, and the results demonstrate improvements in consistency and generalization when compared to training the vision and control components separately. The approach is applicable to a wide range of robotic tasks and has the potential to significantly advance the field of robotic control.This paper presents a method for end-to-end training of deep visuomotor policies for robotic control. The approach combines perception and control in a single policy, which is represented by a deep convolutional neural network (CNN) with 92,000 parameters. The policy is trained using a guided policy search method that transforms policy search into supervised learning, with supervision provided by a trajectory-centric reinforcement learning method. The method is evaluated on a range of real-world manipulation tasks that require close coordination between vision and control, such as screwing a cap onto a bottle, and simulated comparisons are made to prior policy search methods. The key contributions of this work include the development of a guided policy search algorithm for sensorimotor deep learning, a novel CNN architecture designed for robotic control, and the demonstration that end-to-end training of visuomotor policies leads to improved performance compared to training the vision and control components separately. The method is sample-efficient, requiring only minutes of interaction time, and is capable of training deep visuomotor policies for complex, high-dimensional manipulation skills with direct torque control. The approach uses a CNN architecture that automatically learns feature points that capture spatial information about the scene, without any supervision beyond the information from the robot's encoders and camera. The policy is trained to predict actions based on raw observations, rather than the full state of the system, allowing it to handle novel, unknown configurations without requiring full state information. The method also includes a pretraining scheme that allows the policy to be trained with a relatively small number of iterations, and it uses a novel spatial softmax layer to convert pixel-wise features to spatial coordinates, which are then used to compute motor torques. The paper also discusses the use of guided policy search with BADMM (Bregman ADMM) for constrained optimization, and presents a detailed derivation of the algorithm. The method is shown to be effective in learning complex robotic tasks, and the results demonstrate improvements in consistency and generalization when compared to training the vision and control components separately. The approach is applicable to a wide range of robotic tasks and has the potential to significantly advance the field of robotic control.

End-to-End Training of Deep Visuomotor Policies

10/15; Published 4/16 | Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel