EXPLORATION BY RANDOM NETWORK DISTILLATION

EXPLORATION BY RANDOM NETWORK DISTILLATION

30 Oct 2018 | Yuri Burda*, Harrison Edwards*, Amos Storkey, Oleg Klimov
This paper introduces a method called Random Network Distillation (RND) for exploration in deep reinforcement learning. The RND method uses the prediction error of a neural network trained on the agent's past experience to measure the novelty of new experiences. This error is used as an exploration bonus to encourage the agent to explore novel states. The method is simple to implement, efficient to compute, and works well with high-dimensional observations. It can be combined with extrinsic rewards to improve exploration performance. The RND bonus is based on the observation that neural networks tend to have lower prediction errors on examples similar to those they were trained on. This motivates using prediction errors of networks trained on the agent's past experience to quantify the novelty of new experiences. The method is particularly effective in environments with sparse rewards, such as the Atari game Montezuma's Revenge, where it achieves state-of-the-art performance without using demonstrations or access to the underlying state of the game. The paper also introduces a method to flexibly combine intrinsic and extrinsic rewards. This is done by using two value heads for the two reward streams, allowing different discount rates for the different rewards and combining episodic and non-episodic returns. This flexibility enables the agent to explore more effectively and find more rooms in Montezuma's Revenge. The RND method is tested on several hard exploration Atari games, including Montezuma's Revenge, Gravitar, Pitfall!, Private Eye, Solaris, and Venture. The results show that the RND method outperforms other methods, particularly in Montezuma's Revenge, where it achieves state-of-the-art performance without using demonstrations or access to the underlying state of the game. The method is also effective in other games, achieving high performance in Gravitar and Venture. The paper also discusses the relationship between RND and uncertainty quantification, and how the method can be used to detect novelty in environments. It also discusses the importance of observation normalization in deep reinforcement learning, and how it can be used to ensure consistent reward scales across different environments. The experiments show that the RND method is effective in environments with sparse rewards, and that it can be combined with extrinsic rewards to improve exploration performance. The method is also effective in environments with large state spaces, and can be used with different types of policies, including recurrent and convolutional neural networks. The results suggest that RND is a promising method for exploration in deep reinforcement learning, particularly in environments with sparse rewards.This paper introduces a method called Random Network Distillation (RND) for exploration in deep reinforcement learning. The RND method uses the prediction error of a neural network trained on the agent's past experience to measure the novelty of new experiences. This error is used as an exploration bonus to encourage the agent to explore novel states. The method is simple to implement, efficient to compute, and works well with high-dimensional observations. It can be combined with extrinsic rewards to improve exploration performance. The RND bonus is based on the observation that neural networks tend to have lower prediction errors on examples similar to those they were trained on. This motivates using prediction errors of networks trained on the agent's past experience to quantify the novelty of new experiences. The method is particularly effective in environments with sparse rewards, such as the Atari game Montezuma's Revenge, where it achieves state-of-the-art performance without using demonstrations or access to the underlying state of the game. The paper also introduces a method to flexibly combine intrinsic and extrinsic rewards. This is done by using two value heads for the two reward streams, allowing different discount rates for the different rewards and combining episodic and non-episodic returns. This flexibility enables the agent to explore more effectively and find more rooms in Montezuma's Revenge. The RND method is tested on several hard exploration Atari games, including Montezuma's Revenge, Gravitar, Pitfall!, Private Eye, Solaris, and Venture. The results show that the RND method outperforms other methods, particularly in Montezuma's Revenge, where it achieves state-of-the-art performance without using demonstrations or access to the underlying state of the game. The method is also effective in other games, achieving high performance in Gravitar and Venture. The paper also discusses the relationship between RND and uncertainty quantification, and how the method can be used to detect novelty in environments. It also discusses the importance of observation normalization in deep reinforcement learning, and how it can be used to ensure consistent reward scales across different environments. The experiments show that the RND method is effective in environments with sparse rewards, and that it can be combined with extrinsic rewards to improve exploration performance. The method is also effective in environments with large state spaces, and can be used with different types of policies, including recurrent and convolutional neural networks. The results suggest that RND is a promising method for exploration in deep reinforcement learning, particularly in environments with sparse rewards.
Reach us at info@study.space
Understanding Exploration by Random Network Distillation