Understanding Online convex optimization in the bandit setting%3A gradient descent without a gradient

This paper studies online convex optimization in a "bandit" setting, where the goal is to minimize the total cost of a sequence of convex functions without direct access to the gradients. The authors extend Zinkevich's regret bounds for gradient descent in the full-information setting to this bandit scenario, achieving an expected regret of \(O(n^{5/6})\) against an oblivious adversary. They propose a simple approximation method to estimate the gradient using a single sample, which is shown to be sufficient for gradient descent in the online setting. The paper also discusses the application of this approach to various optimization problems, including online linear optimization, and provides theoretical guarantees for the proposed algorithm. Additionally, the authors explore the possibility of extending the algorithm to handle adaptive adversaries and unconstrained settings.This paper studies online convex optimization in a "bandit" setting, where the goal is to minimize the total cost of a sequence of convex functions without direct access to the gradients. The authors extend Zinkevich's regret bounds for gradient descent in the full-information setting to this bandit scenario, achieving an expected regret of \(O(n^{5/6})\) against an oblivious adversary. They propose a simple approximation method to estimate the gradient using a single sample, which is shown to be sufficient for gradient descent in the online setting. The paper also discusses the application of this approach to various optimization problems, including online linear optimization, and provides theoretical guarantees for the proposed algorithm. Additionally, the authors explore the possibility of extending the algorithm to handle adaptive adversaries and unconstrained settings.

Online convex optimization in the bandit setting: gradient descent without a gradient

February 1, 2008 | Abraham D. Flaxman, Adam Tauman Kalai, H. Brendan McMahan