10/15; Published 4/16 | Sergey Levine, Chelsea Finn, Trevor Darrell, Pieter Abbeel
This paper addresses the question of whether training perception and control systems jointly end-to-end provides better performance than training each component separately. The authors develop a method to learn policies that map raw image observations directly to torques at the robot's motors using deep convolutional neural networks (CNNs) with 92,000 parameters. The policies are trained using a guided policy search method, which transforms policy search into supervised learning with supervision provided by a trajectory-centric reinforcement learning method. The method is evaluated on various real-world manipulation tasks requiring close coordination between vision and control, such as screwing a cap onto a bottle. The results demonstrate improvements in consistency and generalization when training visuomotor policies end-to-end compared to training the vision and control components separately. The paper also includes a comparison with prior policy search methods, showing that guided policy search outperforms them in training high-dimensional neural network policies.This paper addresses the question of whether training perception and control systems jointly end-to-end provides better performance than training each component separately. The authors develop a method to learn policies that map raw image observations directly to torques at the robot's motors using deep convolutional neural networks (CNNs) with 92,000 parameters. The policies are trained using a guided policy search method, which transforms policy search into supervised learning with supervision provided by a trajectory-centric reinforcement learning method. The method is evaluated on various real-world manipulation tasks requiring close coordination between vision and control, such as screwing a cap onto a bottle. The results demonstrate improvements in consistency and generalization when training visuomotor policies end-to-end compared to training the vision and control components separately. The paper also includes a comparison with prior policy search methods, showing that guided policy search outperforms them in training high-dimensional neural network policies.