2011 | Marc Peter Deisenroth, Carl Edward Rasmussen
This paper introduces PILCO, a practical and data-efficient model-based policy search method. PILCO addresses the key issue of model bias in model-based reinforcement learning by learning a probabilistic dynamics model and incorporating model uncertainty into long-term planning. This approach allows PILCO to learn from very limited data and facilitates learning from scratch in a few trials. The method uses state-of-the-art approximate inference techniques for policy evaluation and analytic policy gradients for policy improvement. The authors report unprecedented learning efficiency on challenging and high-dimensional control tasks, demonstrating the effectiveness of PILCO in continuous state-action domains and its applicability to physical systems like robots. The paper also discusses the experimental results, including successful learning of control policies for the cart-pole swing-up, cart-double-pendulum swing-up, and unicycle riding tasks, and compares PILCO's data efficiency with other reinforcement learning methods.This paper introduces PILCO, a practical and data-efficient model-based policy search method. PILCO addresses the key issue of model bias in model-based reinforcement learning by learning a probabilistic dynamics model and incorporating model uncertainty into long-term planning. This approach allows PILCO to learn from very limited data and facilitates learning from scratch in a few trials. The method uses state-of-the-art approximate inference techniques for policy evaluation and analytic policy gradients for policy improvement. The authors report unprecedented learning efficiency on challenging and high-dimensional control tasks, demonstrating the effectiveness of PILCO in continuous state-action domains and its applicability to physical systems like robots. The paper also discusses the experimental results, including successful learning of control policies for the cart-pole swing-up, cart-double-pendulum swing-up, and unicycle riding tasks, and compares PILCO's data efficiency with other reinforcement learning methods.