Curiosity-driven Exploration by Self-supervised Prediction

Curiosity-driven Exploration by Self-supervised Prediction

15 May 2017 | Deepak Pathak, Pulkit Agrawal, Alexei A. Efros, Trevor Darrell
This paper introduces a method for curiosity-driven exploration in reinforcement learning, where curiosity is modeled as the prediction error of an agent's ability to predict the consequences of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. The approach enables the agent to explore its environment and learn skills that may be useful later. The method is evaluated in two environments: VizDoom and Super Mario Bros. Three settings are investigated: 1) sparse extrinsic reward, where curiosity allows for fewer interactions to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios, where knowledge from earlier experience helps the agent explore faster. The agent is composed of two subsystems: a reward generator that outputs a curiosity-driven intrinsic reward signal and a policy that outputs a sequence of actions to maximize that reward. The policy is trained to maximize the sum of intrinsic and extrinsic rewards. The intrinsic curiosity reward is derived from the prediction error of a forward dynamics model trained in a feature space that captures only the information relevant to the agent's actions. This feature space is learned using self-supervised prediction, where a neural network is trained to predict the agent's actions based on its current and next states. The forward model then predicts the feature representation of the next state, and the prediction error is used as the intrinsic reward. The method is shown to be effective in sparse reward settings, where the agent learns to explore efficiently without external rewards. It also demonstrates generalization to new scenarios, where knowledge from earlier experience helps the agent explore faster. The approach is robust to uncontrollable aspects of the environment, such as distractor objects and changes in illumination. The method is compared to other baselines, including a pixel-based curiosity model and a variational information maximization approach, and is shown to outperform them in terms of exploration efficiency and generalization. The results indicate that the proposed method enables the agent to learn generalizable skills even in the absence of an explicit goal.This paper introduces a method for curiosity-driven exploration in reinforcement learning, where curiosity is modeled as the prediction error of an agent's ability to predict the consequences of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. The approach enables the agent to explore its environment and learn skills that may be useful later. The method is evaluated in two environments: VizDoom and Super Mario Bros. Three settings are investigated: 1) sparse extrinsic reward, where curiosity allows for fewer interactions to reach the goal; 2) exploration with no extrinsic reward, where curiosity pushes the agent to explore more efficiently; and 3) generalization to unseen scenarios, where knowledge from earlier experience helps the agent explore faster. The agent is composed of two subsystems: a reward generator that outputs a curiosity-driven intrinsic reward signal and a policy that outputs a sequence of actions to maximize that reward. The policy is trained to maximize the sum of intrinsic and extrinsic rewards. The intrinsic curiosity reward is derived from the prediction error of a forward dynamics model trained in a feature space that captures only the information relevant to the agent's actions. This feature space is learned using self-supervised prediction, where a neural network is trained to predict the agent's actions based on its current and next states. The forward model then predicts the feature representation of the next state, and the prediction error is used as the intrinsic reward. The method is shown to be effective in sparse reward settings, where the agent learns to explore efficiently without external rewards. It also demonstrates generalization to new scenarios, where knowledge from earlier experience helps the agent explore faster. The approach is robust to uncontrollable aspects of the environment, such as distractor objects and changes in illumination. The method is compared to other baselines, including a pixel-based curiosity model and a variational information maximization approach, and is shown to outperform them in terms of exploration efficiency and generalization. The results indicate that the proposed method enables the agent to learn generalizable skills even in the absence of an explicit goal.
Reach us at info@study.space