| David Silver*, Julian Schrittwieser*, Karen Simonyan*, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, Demis Hassabis
AlphaGo Zero, developed by DeepMind, is a reinforcement learning algorithm that achieves superhuman performance in the game of Go without human data, guidance, or domain knowledge beyond the game rules. It learns solely through self-play, using a single neural network that combines policy and value functions. The algorithm uses Monte-Carlo tree search (MCTS) to evaluate positions and select moves, with the neural network improving over time by matching its predictions to the outcomes of self-play. AlphaGo Zero outperformed previous versions of AlphaGo, including AlphaGo Lee, which was trained using human data, by a large margin. It achieved a rating of 5,185 on an Elo scale, surpassing AlphaGo Master (4,858) and other previous Go programs. The algorithm demonstrated that reinforcement learning can achieve superhuman performance in complex domains without human input. AlphaGo Zero also rediscovered fundamental Go knowledge and novel strategies, showing that it could learn a different strategy from human play. The algorithm's success highlights the potential of reinforcement learning in artificial intelligence, demonstrating that it can surpass human capabilities in challenging tasks.AlphaGo Zero, developed by DeepMind, is a reinforcement learning algorithm that achieves superhuman performance in the game of Go without human data, guidance, or domain knowledge beyond the game rules. It learns solely through self-play, using a single neural network that combines policy and value functions. The algorithm uses Monte-Carlo tree search (MCTS) to evaluate positions and select moves, with the neural network improving over time by matching its predictions to the outcomes of self-play. AlphaGo Zero outperformed previous versions of AlphaGo, including AlphaGo Lee, which was trained using human data, by a large margin. It achieved a rating of 5,185 on an Elo scale, surpassing AlphaGo Master (4,858) and other previous Go programs. The algorithm demonstrated that reinforcement learning can achieve superhuman performance in complex domains without human input. AlphaGo Zero also rediscovered fundamental Go knowledge and novel strategies, showing that it could learn a different strategy from human play. The algorithm's success highlights the potential of reinforcement learning in artificial intelligence, demonstrating that it can surpass human capabilities in challenging tasks.