Understanding Diversity is All You Need%3A Learning Skills without a Reward Function

DIAYN is a method for learning useful skills without a reward function. The paper proposes a method called DIAYN, which learns skills by maximizing an information-theoretic objective using a maximum entropy policy. The method is tested on various simulated robotic tasks and shows that it can learn diverse skills such as walking and jumping. In reinforcement learning benchmark environments, the method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. The paper shows how pretrained skills can provide a good parameter initialization for downstream tasks and can be composed hierarchically to solve complex, sparse reward tasks. The results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning. The paper introduces DIAYN, a method for learning skills without a reward function. The method learns skills by maximizing an information-theoretic objective using a maximum entropy policy. The method is tested on various simulated robotic tasks and shows that it can learn diverse skills such as walking and jumping. In reinforcement learning benchmark environments, the method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. The paper shows how pretrained skills can provide a good parameter initialization for downstream tasks and can be composed hierarchically to solve complex, sparse reward tasks. The results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning.DIAYN is a method for learning useful skills without a reward function. The paper proposes a method called DIAYN, which learns skills by maximizing an information-theoretic objective using a maximum entropy policy. The method is tested on various simulated robotic tasks and shows that it can learn diverse skills such as walking and jumping. In reinforcement learning benchmark environments, the method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. The paper shows how pretrained skills can provide a good parameter initialization for downstream tasks and can be composed hierarchically to solve complex, sparse reward tasks. The results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning. The paper introduces DIAYN, a method for learning skills without a reward function. The method learns skills by maximizing an information-theoretic objective using a maximum entropy policy. The method is tested on various simulated robotic tasks and shows that it can learn diverse skills such as walking and jumping. In reinforcement learning benchmark environments, the method is able to learn a skill that solves the benchmark task despite never receiving the true task reward. The paper shows how pretrained skills can provide a good parameter initialization for downstream tasks and can be composed hierarchically to solve complex, sparse reward tasks. The results suggest that unsupervised discovery of skills can serve as an effective pretraining mechanism for overcoming challenges of exploration and data efficiency in reinforcement learning.

DIVERSITY IS ALL YOU NEED: LEARNING SKILLS WITHOUT A REWARD FUNCTION

9 Oct 2018 | Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine