EXPLORATION BY RANDOM NETWORK DISTILLATION

EXPLORATION BY RANDOM NETWORK DISTILLATION

30 Oct 2018 | Yuri Burda*, Harrison Edwards*, Amos Storkey, Oleg Klimov
This paper introduces a novel exploration bonus for deep reinforcement learning (RL) methods, called Random Network Distillation (RND), which is easy to implement and adds minimal computational overhead. The RND bonus is based on the prediction error of a neural network trained on a fixed, randomly initialized target network. The authors also propose a method to combine intrinsic and extrinsic rewards flexibly. They demonstrate that the RND bonus, combined with this flexibility, significantly improves performance on several challenging Atari games, particularly on Montezuma’s Revenge, a game known for its difficulty for RL agents. The method achieves state-of-the-art performance on Montezuma’s Revenge without using demonstrations or access to the underlying game state, and occasionally completes the first level. The paper includes extensive ablations and comparisons to baseline methods, showing that the RND bonus effectively encourages exploration and can be scaled up to large numbers of parallel environments.This paper introduces a novel exploration bonus for deep reinforcement learning (RL) methods, called Random Network Distillation (RND), which is easy to implement and adds minimal computational overhead. The RND bonus is based on the prediction error of a neural network trained on a fixed, randomly initialized target network. The authors also propose a method to combine intrinsic and extrinsic rewards flexibly. They demonstrate that the RND bonus, combined with this flexibility, significantly improves performance on several challenging Atari games, particularly on Montezuma’s Revenge, a game known for its difficulty for RL agents. The method achieves state-of-the-art performance on Montezuma’s Revenge without using demonstrations or access to the underlying game state, and occasionally completes the first level. The paper includes extensive ablations and comparisons to baseline methods, showing that the RND bonus effectively encourages exploration and can be scaled up to large numbers of parallel environments.
Reach us at info@study.space
[slides and audio] Exploration by Random Network Distillation