14 Jul 2020 | Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen
Thompson Sampling is an algorithm designed for online decision-making problems, balancing exploitation and exploration. The tutorial covers the algorithm's application through various examples, including Bernoulli bandit problems, shortest path problems, product recommendation, active learning with neural networks, and reinforcement learning in Markov decision processes. It discusses when and why Thompson sampling is effective and its relations to alternative algorithms. The tutorial also provides practical considerations such as prior distribution specification, safety constraints, and nonstationarity. The introduction highlights the multi-armed bandit problem, emphasizing the trade-off between exploiting known actions and exploring to learn new information. The tutorial explains the concept of Thompson sampling, its historical context, and its recent resurgence in industry and academia. It compares Thompson sampling to greedy algorithms, demonstrating how Thompson sampling more intelligently allocates exploration effort. The tutorial then extends the discussion to more general online decision problems, providing a formal framework for Thompson sampling and greedy algorithms. It includes detailed examples, such as the beta-Bernoulli bandit and the shortest path problem, to illustrate the algorithm's application and performance. The tutorial also explores approximate methods for Thompson sampling, including Gibbs sampling, Langevin Monte Carlo, Laplace approximation, and the bootstrap, to handle complex models where exact Bayesian inference is infeasible.Thompson Sampling is an algorithm designed for online decision-making problems, balancing exploitation and exploration. The tutorial covers the algorithm's application through various examples, including Bernoulli bandit problems, shortest path problems, product recommendation, active learning with neural networks, and reinforcement learning in Markov decision processes. It discusses when and why Thompson sampling is effective and its relations to alternative algorithms. The tutorial also provides practical considerations such as prior distribution specification, safety constraints, and nonstationarity. The introduction highlights the multi-armed bandit problem, emphasizing the trade-off between exploiting known actions and exploring to learn new information. The tutorial explains the concept of Thompson sampling, its historical context, and its recent resurgence in industry and academia. It compares Thompson sampling to greedy algorithms, demonstrating how Thompson sampling more intelligently allocates exploration effort. The tutorial then extends the discussion to more general online decision problems, providing a formal framework for Thompson sampling and greedy algorithms. It includes detailed examples, such as the beta-Bernoulli bandit and the shortest path problem, to illustrate the algorithm's application and performance. The tutorial also explores approximate methods for Thompson sampling, including Gibbs sampling, Langevin Monte Carlo, Laplace approximation, and the bootstrap, to handle complex models where exact Bayesian inference is infeasible.