Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation

31 May 2016 | Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, Joshua B. Tenenbaum
The paper introduces h-DQN (Hierarchical Deep Q-Network), a framework that integrates hierarchical value functions and intrinsically motivated deep reinforcement learning to address the challenge of learning goal-directed behavior in environments with sparse feedback. h-DQN consists of a top-level meta-controller and a lower-level controller, where the meta-controller learns policies over intrinsic goals, and the controller learns policies over atomic actions to achieve these goals. The framework allows for flexible goal specifications, such as functions over entities and relations, which helps in efficient exploration in complex environments. The authors demonstrate the effectiveness of h-DQN on two problems with very sparse, delayed feedback: a complex discrete stochastic decision process and the classic ATARI game 'Montezuma's Revenge.' The results show that h-DQN outperforms traditional Q-learning in both environments, achieving higher average rewards and more efficient exploration. The paper also discusses related work in reinforcement learning, temporal abstractions, intrinsically motivated reinforcement learning, object-based reinforcement learning, and deep reinforcement learning.The paper introduces h-DQN (Hierarchical Deep Q-Network), a framework that integrates hierarchical value functions and intrinsically motivated deep reinforcement learning to address the challenge of learning goal-directed behavior in environments with sparse feedback. h-DQN consists of a top-level meta-controller and a lower-level controller, where the meta-controller learns policies over intrinsic goals, and the controller learns policies over atomic actions to achieve these goals. The framework allows for flexible goal specifications, such as functions over entities and relations, which helps in efficient exploration in complex environments. The authors demonstrate the effectiveness of h-DQN on two problems with very sparse, delayed feedback: a complex discrete stochastic decision process and the classic ATARI game 'Montezuma's Revenge.' The results show that h-DQN outperforms traditional Q-learning in both environments, achieving higher average rewards and more efficient exploration. The paper also discusses related work in reinforcement learning, temporal abstractions, intrinsically motivated reinforcement learning, object-based reinforcement learning, and deep reinforcement learning.
Reach us at info@study.space