This paper introduces Distributional Random Network Distillation (DRND), an improvement over the Random Network Distillation (RND) method to address the issue of bonus inconsistency in deep reinforcement learning. RND, a popular exploration method, suffers from inconsistent bonus allocation, particularly during initial training and in environments with sparse reward signals. DRND enhances exploration by distilling a distribution of random networks and implicitly incorporating pseudo-counts to improve the precision of bonus allocation. This approach allows agents to explore more effectively without significant computational overhead. Theoretical analysis and experimental results show that DRND outperforms RND in exploration-demanding environments and serves as an effective anti-exploration mechanism in offline tasks. DRND is also integrated with Proximal Policy Optimization (PPO) and demonstrates superior performance in online exploration scenarios and D4RL offline tasks. The method is shown to provide better intrinsic rewards and more accurate state transition frequency estimates, leading to improved performance in various reinforcement learning settings.This paper introduces Distributional Random Network Distillation (DRND), an improvement over the Random Network Distillation (RND) method to address the issue of bonus inconsistency in deep reinforcement learning. RND, a popular exploration method, suffers from inconsistent bonus allocation, particularly during initial training and in environments with sparse reward signals. DRND enhances exploration by distilling a distribution of random networks and implicitly incorporating pseudo-counts to improve the precision of bonus allocation. This approach allows agents to explore more effectively without significant computational overhead. Theoretical analysis and experimental results show that DRND outperforms RND in exploration-demanding environments and serves as an effective anti-exploration mechanism in offline tasks. DRND is also integrated with Proximal Policy Optimization (PPO) and demonstrates superior performance in online exploration scenarios and D4RL offline tasks. The method is shown to provide better intrinsic rewards and more accurate state transition frequency estimates, leading to improved performance in various reinforcement learning settings.