2011 | Wei Chu, Lihong Li, Lev Reyzin, Robert E. Schapire
This paper studies the contextual bandit problem with linear payoff functions, focusing on the efficiency and performance of the LinUCB algorithm. The authors prove an upper bound of \( O(\sqrt{Td \ln^3(KT \ln(T)/\delta)}) \) regret for the LinUCB algorithm, which is the simplest known efficient algorithm for this problem. They also provide a lower bound of \( \Omega(\sqrt{Td}) \), showing that their upper bound is tight up to logarithmic factors. The paper introduces two algorithms, BaseLinUCB and SupLinUCB, and analyzes their regret. The analysis involves decomposing the problem into simpler components and using techniques from previous work on bandit algorithms. The authors also discuss the practical advantages of LinUCB over other algorithms and highlight the importance of exploiting the linear structure of the predictors. Finally, they present a matching lower bound, demonstrating that their regret bound is optimal up to logarithmic factors.This paper studies the contextual bandit problem with linear payoff functions, focusing on the efficiency and performance of the LinUCB algorithm. The authors prove an upper bound of \( O(\sqrt{Td \ln^3(KT \ln(T)/\delta)}) \) regret for the LinUCB algorithm, which is the simplest known efficient algorithm for this problem. They also provide a lower bound of \( \Omega(\sqrt{Td}) \), showing that their upper bound is tight up to logarithmic factors. The paper introduces two algorithms, BaseLinUCB and SupLinUCB, and analyzes their regret. The analysis involves decomposing the problem into simpler components and using techniques from previous work on bandit algorithms. The authors also discuss the practical advantages of LinUCB over other algorithms and highlight the importance of exploiting the linear structure of the predictors. Finally, they present a matching lower bound, demonstrating that their regret bound is optimal up to logarithmic factors.