Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization

27 May 2016 | Chelsea Finn, Sergey Levine, Pieter Abbeel
This paper introduces a method called guided cost learning, which uses policy optimization to learn complex behaviors from expert demonstrations. The method addresses two key challenges in inverse optimal control (IOC): learning arbitrary nonlinear cost functions without manual feature engineering, and learning cost functions under unknown dynamics for high-dimensional systems. The approach combines sample-based maximum entropy IOC with forward reinforcement learning using time-varying linear models. The algorithm adaptively samples trajectories to estimate the IOC partition function, and uses policy optimization to guide the sampling distribution toward regions that are more useful for estimating the partition function. The method is evaluated on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvements over prior methods in terms of task complexity and sample efficiency. The algorithm learns both a cost function and a policy that can be used to execute the desired behavior. The method is particularly effective for tasks that are too complex to acquire a good global cost function from a small number of demonstrations. The algorithm uses neural networks to represent the cost function, which allows for the learning of complex, expressive cost functions without the need for manual feature engineering. The method also includes two regularization techniques for IOC, one general and one specific to episodic domains. The algorithm is able to handle unknown dynamics and high-dimensional systems, and can be used on real physical systems with a modest number of samples. The results show that the method outperforms previous methods on a set of simulated benchmark tasks and on two real-world tasks learned directly from human demonstrations. The method is particularly effective for tasks that require using torque control and vision to perform a variety of robotic manipulation behaviors.This paper introduces a method called guided cost learning, which uses policy optimization to learn complex behaviors from expert demonstrations. The method addresses two key challenges in inverse optimal control (IOC): learning arbitrary nonlinear cost functions without manual feature engineering, and learning cost functions under unknown dynamics for high-dimensional systems. The approach combines sample-based maximum entropy IOC with forward reinforcement learning using time-varying linear models. The algorithm adaptively samples trajectories to estimate the IOC partition function, and uses policy optimization to guide the sampling distribution toward regions that are more useful for estimating the partition function. The method is evaluated on a series of simulated tasks and real-world robotic manipulation problems, demonstrating substantial improvements over prior methods in terms of task complexity and sample efficiency. The algorithm learns both a cost function and a policy that can be used to execute the desired behavior. The method is particularly effective for tasks that are too complex to acquire a good global cost function from a small number of demonstrations. The algorithm uses neural networks to represent the cost function, which allows for the learning of complex, expressive cost functions without the need for manual feature engineering. The method also includes two regularization techniques for IOC, one general and one specific to episodic domains. The algorithm is able to handle unknown dynamics and high-dimensional systems, and can be used on real physical systems with a modest number of samples. The results show that the method outperforms previous methods on a set of simulated benchmark tasks and on two real-world tasks learned directly from human demonstrations. The method is particularly effective for tasks that require using torque control and vision to perform a variety of robotic manipulation behaviors.
Reach us at info@study.space