Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

31 May 2016 | Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, Joshua B. Tenenbaum
This paper introduces a hierarchical deep reinforcement learning framework called hierarchical-DQN (h-DQN) that integrates hierarchical value functions with intrinsic motivation to improve exploration in environments with sparse feedback. The framework operates at different temporal scales, with a top-level value function learning policies over intrinsic goals and a lower-level function learning policies over atomic actions to satisfy these goals. This allows for flexible goal specifications, such as functions over entities and relations, which can help in efficient exploration in complex environments. The paper demonstrates the effectiveness of h-DQN on two tasks with very sparse, delayed feedback: a complex discrete stochastic decision process and the classic ATARI game 'Montezuma's Revenge'. The framework uses temporal abstraction through options, which are policies that can be terminated according to a stochastic function. The agent is motivated to solve intrinsic goals (via learning options) to aid exploration, which helps mitigate the sparse feedback problem. The model uses a two-stage hierarchy consisting of a controller and a meta-controller. The meta-controller selects goals, while the controller uses the state and the chosen goal to select actions. The internal critic evaluates whether a goal has been reached and provides an appropriate reward. The objective function for the controller is to maximize cumulative intrinsic reward, while the meta-controller aims to optimize cumulative extrinsic reward. The paper also discusses related work in reinforcement learning with temporal abstractions, intrinsic motivation, object-based reinforcement learning, and deep reinforcement learning. It highlights the importance of intrinsic motivation in exploration and the potential of using entities and relations to parameterize goals for more efficient learning. The experiments show that h-DQN outperforms traditional methods like Q-learning in tasks with sparse rewards, particularly in the discrete stochastic decision process and the ATARI game 'Montezuma's Revenge'. The results demonstrate that the framework can learn to achieve complex goals through intrinsic motivation and hierarchical abstraction, leading to more efficient exploration and better performance in environments with delayed rewards.This paper introduces a hierarchical deep reinforcement learning framework called hierarchical-DQN (h-DQN) that integrates hierarchical value functions with intrinsic motivation to improve exploration in environments with sparse feedback. The framework operates at different temporal scales, with a top-level value function learning policies over intrinsic goals and a lower-level function learning policies over atomic actions to satisfy these goals. This allows for flexible goal specifications, such as functions over entities and relations, which can help in efficient exploration in complex environments. The paper demonstrates the effectiveness of h-DQN on two tasks with very sparse, delayed feedback: a complex discrete stochastic decision process and the classic ATARI game 'Montezuma's Revenge'. The framework uses temporal abstraction through options, which are policies that can be terminated according to a stochastic function. The agent is motivated to solve intrinsic goals (via learning options) to aid exploration, which helps mitigate the sparse feedback problem. The model uses a two-stage hierarchy consisting of a controller and a meta-controller. The meta-controller selects goals, while the controller uses the state and the chosen goal to select actions. The internal critic evaluates whether a goal has been reached and provides an appropriate reward. The objective function for the controller is to maximize cumulative intrinsic reward, while the meta-controller aims to optimize cumulative extrinsic reward. The paper also discusses related work in reinforcement learning with temporal abstractions, intrinsic motivation, object-based reinforcement learning, and deep reinforcement learning. It highlights the importance of intrinsic motivation in exploration and the potential of using entities and relations to parameterize goals for more efficient learning. The experiments show that h-DQN outperforms traditional methods like Q-learning in tasks with sparse rewards, particularly in the discrete stochastic decision process and the ATARI game 'Montezuma's Revenge'. The results demonstrate that the framework can learn to achieve complex goals through intrinsic motivation and hierarchical abstraction, leading to more efficient exploration and better performance in environments with delayed rewards.
Reach us at info@study.space
[slides] Hierarchical Deep Reinforcement Learning%3A Integrating Temporal Abstraction and Intrinsic Motivation | StudySpace