Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search

Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search

May 2006 | Rémi Coulom
Rémi Coulom presents a new algorithm for combining Monte-Carlo evaluation with tree search in Monte-Carlo Tree Search (MCTS). The algorithm avoids separating the min-max phase from the Monte-Carlo phase, instead using a general backup operator that transitions from averaging to min-max as simulations increase. This allows fine-grained control of tree growth and efficient selectivity methods. The algorithm was implemented in the 9×9 Go-playing program Crazy Stone, which won the 10th KGS computer-Go tournament. The traditional approach for two-player zero-sum games uses alpha-beta search with a heuristic evaluator, but this has failed for Go due to its complexity. Monte-Carlo evaluation, which averages outcomes of random continuations, is more suitable for Go's dynamic nature. It has been applied to deterministic games by randomly choosing actions until a terminal state is reached. The paper introduces a new algorithm that combines Monte-Carlo evaluation with tree search. It uses a selective method to allocate simulations, based on the probability of a move being better than the current best. The algorithm uses a formula that approximates Gaussian distributions to determine the probability of a move being better. It also introduces a backup method that balances between the mean and max operators to improve accuracy. The algorithm was tested against other programs, including GNU Go and Indigo, and showed superior performance. Crazy Stone outperformed Indigo in 100-game matches, indicating the algorithm's efficiency. However, Indigo's knowledge-based pre-selector and different simulation methods may have contributed to its performance. The paper concludes that the new algorithm improves MCTS by introducing an efficient backup method. Future research directions include improving selectivity algorithms, incorporating game-specific knowledge, and scaling the approach to larger boards. The algorithm's success in Go demonstrates its potential for other complex games.Rémi Coulom presents a new algorithm for combining Monte-Carlo evaluation with tree search in Monte-Carlo Tree Search (MCTS). The algorithm avoids separating the min-max phase from the Monte-Carlo phase, instead using a general backup operator that transitions from averaging to min-max as simulations increase. This allows fine-grained control of tree growth and efficient selectivity methods. The algorithm was implemented in the 9×9 Go-playing program Crazy Stone, which won the 10th KGS computer-Go tournament. The traditional approach for two-player zero-sum games uses alpha-beta search with a heuristic evaluator, but this has failed for Go due to its complexity. Monte-Carlo evaluation, which averages outcomes of random continuations, is more suitable for Go's dynamic nature. It has been applied to deterministic games by randomly choosing actions until a terminal state is reached. The paper introduces a new algorithm that combines Monte-Carlo evaluation with tree search. It uses a selective method to allocate simulations, based on the probability of a move being better than the current best. The algorithm uses a formula that approximates Gaussian distributions to determine the probability of a move being better. It also introduces a backup method that balances between the mean and max operators to improve accuracy. The algorithm was tested against other programs, including GNU Go and Indigo, and showed superior performance. Crazy Stone outperformed Indigo in 100-game matches, indicating the algorithm's efficiency. However, Indigo's knowledge-based pre-selector and different simulation methods may have contributed to its performance. The paper concludes that the new algorithm improves MCTS by introducing an efficient backup method. Future research directions include improving selectivity algorithms, incorporating game-specific knowledge, and scaling the approach to larger boards. The algorithm's success in Go demonstrates its potential for other complex games.
Reach us at info@study.space
Understanding Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search