2011 | Marc Peter Deisenroth, Carl Edward Rasmussen
PILCO is a model-based and data-efficient policy search method that addresses the issue of model bias in reinforcement learning. It uses a probabilistic Gaussian process (GP) to model the environment's dynamics and incorporates model uncertainty into planning and policy evaluation. This allows PILCO to learn effectively from very little data and to perform well in high-dimensional control tasks. The method uses analytic policy gradients for policy improvement and approximate inference for policy evaluation. PILCO achieves unprecedented data efficiency in continuous state-action domains and is applicable to physical systems like robots. The paper presents experimental results showing that PILCO can learn complex control tasks, such as balancing a cart-pole, controlling a cart-double-pendulum, and riding a unicycle, with minimal interaction time. The method is data-efficient and does not rely on expert knowledge or prior information. It uses a probabilistic dynamics model to account for model uncertainty, which helps in reducing model bias. The results demonstrate that PILCO outperforms other reinforcement learning methods in terms of data efficiency. The paper concludes that PILCO is a practical and effective approach for policy search in model-based reinforcement learning.PILCO is a model-based and data-efficient policy search method that addresses the issue of model bias in reinforcement learning. It uses a probabilistic Gaussian process (GP) to model the environment's dynamics and incorporates model uncertainty into planning and policy evaluation. This allows PILCO to learn effectively from very little data and to perform well in high-dimensional control tasks. The method uses analytic policy gradients for policy improvement and approximate inference for policy evaluation. PILCO achieves unprecedented data efficiency in continuous state-action domains and is applicable to physical systems like robots. The paper presents experimental results showing that PILCO can learn complex control tasks, such as balancing a cart-pole, controlling a cart-double-pendulum, and riding a unicycle, with minimal interaction time. The method is data-efficient and does not rely on expert knowledge or prior information. It uses a probabilistic dynamics model to account for model uncertainty, which helps in reducing model bias. The results demonstrate that PILCO outperforms other reinforcement learning methods in terms of data efficiency. The paper concludes that PILCO is a practical and effective approach for policy search in model-based reinforcement learning.