6 Mar 2017 | Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu
FeUdal Networks (FuNs) are a novel architecture for hierarchical reinforcement learning, inspired by the feudal reinforcement learning (FRL) framework proposed by Dayan and Hinton. FuNs consist of two modules: a Manager and a Worker. The Manager operates at a lower temporal resolution, setting abstract goals in a latent state space, while the Worker generates primitive actions at a higher temporal resolution. The Manager's goals are trained using an approximate transition policy gradient, which exploits the semantic meaning of the goals. The Worker is intrinsically motivated to follow these goals, receiving rewards based on how well it achieves them. This decoupling allows for long-term credit assignment and the emergence of sub-policies associated with different goals. Experiments on various Atari games and a 3D DeepMind Lab environment demonstrate that FuNs significantly outperform baseline agents in tasks requiring long-term credit assignment and memorization. Key contributions include a consistent, end-to-end differentiable model, a novel approximate transition policy gradient update, the use of directional goals, and a dilated LSTM design for the Manager.FeUdal Networks (FuNs) are a novel architecture for hierarchical reinforcement learning, inspired by the feudal reinforcement learning (FRL) framework proposed by Dayan and Hinton. FuNs consist of two modules: a Manager and a Worker. The Manager operates at a lower temporal resolution, setting abstract goals in a latent state space, while the Worker generates primitive actions at a higher temporal resolution. The Manager's goals are trained using an approximate transition policy gradient, which exploits the semantic meaning of the goals. The Worker is intrinsically motivated to follow these goals, receiving rewards based on how well it achieves them. This decoupling allows for long-term credit assignment and the emergence of sub-policies associated with different goals. Experiments on various Atari games and a 3D DeepMind Lab environment demonstrate that FuNs significantly outperform baseline agents in tasks requiring long-term credit assignment and memorization. Key contributions include a consistent, end-to-end differentiable model, a novel approximate transition policy gradient update, the use of directional goals, and a dilated LSTM design for the Manager.