Introduction to Multi-Armed Bandits

Introduction to Multi-Armed Bandits

November 2019 | Aleksandrs Slivkins
This book provides an introductory, textbook-like treatment of multi-armed bandits, a framework for algorithms making decisions under uncertainty. It covers various lines of work, including IID rewards, adversarial rewards, contextual bandits, and connections to economics. The book is structured into chapters, each focusing on a specific topic, with self-contained introductions, reviews of further developments, and exercises. It emphasizes accessibility, with a focus on fundamental ideas and elementary proofs over the strongest possible results. The book is based on a graduate course at the University of Maryland and includes material updated to reflect the latest developments. It avoids deep connections to online convex optimization and reinforcement learning, and does not discuss Markovian models. The book is complemented by another recent work, "Bandits for Everyone" by Lattimore and Szepesvári, which covers different topics. The book introduces the concept of multi-armed bandits through examples such as news websites, dynamic pricing, and investment. It discusses the tradeoff between exploration and exploitation, and presents algorithms like Explore-First, Epsilon-greedy, Successive Elimination, and UCB1. The book also covers various modeling dimensions, including auxiliary feedback, reward models, contexts, Bayesian priors, structured rewards, and global constraints. It discusses application domains such as medical trials, web applications, economics, and computer systems. The book provides a comprehensive overview of multi-armed bandits, with a focus on regret analysis and performance bounds.This book provides an introductory, textbook-like treatment of multi-armed bandits, a framework for algorithms making decisions under uncertainty. It covers various lines of work, including IID rewards, adversarial rewards, contextual bandits, and connections to economics. The book is structured into chapters, each focusing on a specific topic, with self-contained introductions, reviews of further developments, and exercises. It emphasizes accessibility, with a focus on fundamental ideas and elementary proofs over the strongest possible results. The book is based on a graduate course at the University of Maryland and includes material updated to reflect the latest developments. It avoids deep connections to online convex optimization and reinforcement learning, and does not discuss Markovian models. The book is complemented by another recent work, "Bandits for Everyone" by Lattimore and Szepesvári, which covers different topics. The book introduces the concept of multi-armed bandits through examples such as news websites, dynamic pricing, and investment. It discusses the tradeoff between exploration and exploitation, and presents algorithms like Explore-First, Epsilon-greedy, Successive Elimination, and UCB1. The book also covers various modeling dimensions, including auxiliary feedback, reward models, contexts, Bayesian priors, structured rewards, and global constraints. It discusses application domains such as medical trials, web applications, economics, and computer systems. The book provides a comprehensive overview of multi-armed bandits, with a focus on regret analysis and performance bounds.
Reach us at info@study.space
[slides and audio] Introduction to Multi-Armed Bandits