Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

21 Feb 2020 | Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver
The paper introduces MuZero, a new approach to model-based reinforcement learning (RL) that achieves state-of-the-art performance in visually complex domains like Atari games and superhuman performance in precision planning tasks such as chess, Go, and shogi. MuZero combines tree-based search with a learned model to predict reward, action-selection policy, and value function iteratively. The model receives observations as input and transforms them into a hidden state, which is updated iteratively by a recurrent process that receives the previous hidden state and a hypothetical next action. The model is trained end-to-end to accurately estimate these quantities, matching the improved estimates generated by search and observed rewards. MuZero does not require knowledge of game rules or environment dynamics, making it applicable to a wide range of real-world domains. The algorithm is evaluated on 57 Atari games and classic board games, outperforming previous state-of-the-art methods in both domains.The paper introduces MuZero, a new approach to model-based reinforcement learning (RL) that achieves state-of-the-art performance in visually complex domains like Atari games and superhuman performance in precision planning tasks such as chess, Go, and shogi. MuZero combines tree-based search with a learned model to predict reward, action-selection policy, and value function iteratively. The model receives observations as input and transforms them into a hidden state, which is updated iteratively by a recurrent process that receives the previous hidden state and a hypothetical next action. The model is trained end-to-end to accurately estimate these quantities, matching the improved estimates generated by search and observed rewards. MuZero does not require knowledge of game rules or environment dynamics, making it applicable to a wide range of real-world domains. The algorithm is evaluated on 57 Atari games and classic board games, outperforming previous state-of-the-art methods in both domains.
Reach us at info@study.space