IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures

28 Jun 2018 | Lasse Espeholt *† Hubert Soyer *† Remi Munos *† Karen Simonyan † Volodymyr Mnih † Tom Ward † Yotam Doron † Vlad Firoiu † Tim Harley † Iain Dunning † Shane Legg † Koray Kavukcuoglu †
The paper introduces IMPALA (Importance Weighted Actor-Learner Architecture), a scalable distributed reinforcement learning agent designed to handle a large number of tasks with a single set of parameters. IMPALA addresses the challenges of handling increased data and extended training time by efficiently utilizing resources in single-machine training and scaling to thousands of machines without sacrificing data efficiency or resource utilization. The key innovation is the V-trace off-policy correction method, which combines decoupled acting and learning to achieve high throughput. IMPALA demonstrates superior performance on multi-task reinforcement learning tasks, such as DMLab-30 and Atari-57, achieving better results with less data and showing positive transfer between tasks. The source code is publicly available.The paper introduces IMPALA (Importance Weighted Actor-Learner Architecture), a scalable distributed reinforcement learning agent designed to handle a large number of tasks with a single set of parameters. IMPALA addresses the challenges of handling increased data and extended training time by efficiently utilizing resources in single-machine training and scaling to thousands of machines without sacrificing data efficiency or resource utilization. The key innovation is the V-trace off-policy correction method, which combines decoupled acting and learning to achieve high throughput. IMPALA demonstrates superior performance on multi-task reinforcement learning tasks, such as DMLab-30 and Atari-57, achieving better results with less data and showing positive transfer between tasks. The source code is publicly available.
Reach us at info@study.space
Understanding IMPALA%3A Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures