14 Jul 2020 | Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen
This tutorial introduces Thompson sampling, a Bayesian algorithm for online decision problems that balances exploration and exploitation. The algorithm is effective in a wide range of applications, including Bernoulli bandits, shortest path problems, product recommendation, and reinforcement learning. It works by sampling from a posterior distribution of parameters to select actions, which allows it to adaptively explore and exploit based on observed data. Thompson sampling is particularly useful when dealing with complex information structures where the outcomes of actions inform beliefs about other actions.
The algorithm is contrasted with greedy approaches, which can be inefficient as they do not actively explore. Thompson sampling, on the other hand, intelligently allocates exploration effort by sampling from posterior distributions, leading to better performance in many scenarios. The tutorial also discusses the application of Thompson sampling to various problems, including the Bernoulli bandit and shortest path problems, and highlights its advantages over alternative methods like epsilon-greedy exploration.
The tutorial covers the theoretical foundations of Thompson sampling, including its Bayesian formulation and the use of conjugacy properties for efficient inference. It also discusses practical considerations such as prior distribution specification, safety constraints, and nonstationarity. The tutorial provides guidance on approximations to Thompson sampling that can simplify computation and discusses alternative approaches to approximate posterior sampling, including Gibbs sampling, Langevin Monte Carlo, and the Laplace approximation. The goal of the tutorial is to provide a comprehensive understanding of when, why, and how to apply Thompson sampling in practice.This tutorial introduces Thompson sampling, a Bayesian algorithm for online decision problems that balances exploration and exploitation. The algorithm is effective in a wide range of applications, including Bernoulli bandits, shortest path problems, product recommendation, and reinforcement learning. It works by sampling from a posterior distribution of parameters to select actions, which allows it to adaptively explore and exploit based on observed data. Thompson sampling is particularly useful when dealing with complex information structures where the outcomes of actions inform beliefs about other actions.
The algorithm is contrasted with greedy approaches, which can be inefficient as they do not actively explore. Thompson sampling, on the other hand, intelligently allocates exploration effort by sampling from posterior distributions, leading to better performance in many scenarios. The tutorial also discusses the application of Thompson sampling to various problems, including the Bernoulli bandit and shortest path problems, and highlights its advantages over alternative methods like epsilon-greedy exploration.
The tutorial covers the theoretical foundations of Thompson sampling, including its Bayesian formulation and the use of conjugacy properties for efficient inference. It also discusses practical considerations such as prior distribution specification, safety constraints, and nonstationarity. The tutorial provides guidance on approximations to Thompson sampling that can simplify computation and discusses alternative approaches to approximate posterior sampling, including Gibbs sampling, Langevin Monte Carlo, and the Laplace approximation. The goal of the tutorial is to provide a comprehensive understanding of when, why, and how to apply Thompson sampling in practice.