Understanding Exploration and Anti-Exploration with Distributional Random Network Distillation

The paper addresses the issue of "bonus inconsistency" in the Random Network Distillation (RND) algorithm, which is a common problem in exploration methods for deep reinforcement learning. RND, while effective in many environments, often requires more discriminative power in bonus allocation. To tackle this, the authors introduce Distributional Random Network Distillation (DRND), which enhances the exploration process by distilling a distribution of random networks and incorporating pseudo counts to improve the precision of bonus allocation. This approach encourages agents to engage in more extensive exploration without introducing significant computational overhead. The key contributions of the paper are: 1. **Identification of Bonus Inconsistency**: The paper highlights the primary limitation of RND, which is the inconsistency in bonus allocation, particularly during the initial stages of training. 2. **Introduction of DRND**: DRND distills multiple random target networks, effectively operating as a pseudo-count model. This approach combines the advantages of count-based techniques and RND, enhancing performance without additional computational or spatial overhead. 3. **Theoretical and Empirical Validation**: Theoretical analysis and experimental results demonstrate that DRND outperforms RND in terms of bonus consistency, accuracy of state transition frequency estimation, and better discrimination of dataset distributions. 4. **Application in Online and Offline Settings**: DRND is evaluated in both online and offline settings, showing superior performance in challenging environments and as an effective anti-exploration penalty term in offline tasks. The paper also provides a detailed comparison of DRND with other exploration methods, including count-based and pseudo-count approaches, and demonstrates its effectiveness through various experiments. The code for DRND is publicly available at <https://github.com/yk7333/DRND>.The paper addresses the issue of "bonus inconsistency" in the Random Network Distillation (RND) algorithm, which is a common problem in exploration methods for deep reinforcement learning. RND, while effective in many environments, often requires more discriminative power in bonus allocation. To tackle this, the authors introduce Distributional Random Network Distillation (DRND), which enhances the exploration process by distilling a distribution of random networks and incorporating pseudo counts to improve the precision of bonus allocation. This approach encourages agents to engage in more extensive exploration without introducing significant computational overhead. The key contributions of the paper are: 1. **Identification of Bonus Inconsistency**: The paper highlights the primary limitation of RND, which is the inconsistency in bonus allocation, particularly during the initial stages of training. 2. **Introduction of DRND**: DRND distills multiple random target networks, effectively operating as a pseudo-count model. This approach combines the advantages of count-based techniques and RND, enhancing performance without additional computational or spatial overhead. 3. **Theoretical and Empirical Validation**: Theoretical analysis and experimental results demonstrate that DRND outperforms RND in terms of bonus consistency, accuracy of state transition frequency estimation, and better discrimination of dataset distributions. 4. **Application in Online and Offline Settings**: DRND is evaluated in both online and offline settings, showing superior performance in challenging environments and as an effective anti-exploration penalty term in offline tasks. The paper also provides a detailed comparison of DRND with other exploration methods, including count-based and pseudo-count approaches, and demonstrates its effectiveness through various experiments. The code for DRND is publicly available at <https://github.com/yk7333/DRND>.

Exploration and Anti-Exploration with Distributional Random Network Distillation

2024 | Kai Yang, Jian Tao, Jiafei Lyu, Xiu Li