15 May 2017 | Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
This paper introduces a novel approach to curiosity-driven exploration in reinforcement learning, where the agent's intrinsic reward is derived from its ability to predict the consequences of its actions in a learned visual feature space. The method, called Intrinsic Curiosity Module (ICM), scales to high-dimensional continuous state spaces like images, bypasses the challenges of directly predicting pixels, and ignores irrelevant environmental factors. The ICM is evaluated in two environments: VizDoom and Super Mario Bros. The study investigates three settings: sparse extrinsic rewards, no extrinsic rewards, and generalization to unseen scenarios. Results show that the ICM significantly improves the agent's performance in sparse reward tasks, enables efficient exploration in the absence of rewards, and generalizes well to new environments. The paper also discusses the robustness of the ICM to uncontrollable environmental dynamics and compares it with other state-of-the-art methods.This paper introduces a novel approach to curiosity-driven exploration in reinforcement learning, where the agent's intrinsic reward is derived from its ability to predict the consequences of its actions in a learned visual feature space. The method, called Intrinsic Curiosity Module (ICM), scales to high-dimensional continuous state spaces like images, bypasses the challenges of directly predicting pixels, and ignores irrelevant environmental factors. The ICM is evaluated in two environments: VizDoom and Super Mario Bros. The study investigates three settings: sparse extrinsic rewards, no extrinsic rewards, and generalization to unseen scenarios. Results show that the ICM significantly improves the agent's performance in sparse reward tasks, enables efficient exploration in the absence of rewards, and generalizes well to new environments. The paper also discusses the robustness of the ICM to uncontrollable environmental dynamics and compares it with other state-of-the-art methods.