Online convex optimization in the bandit setting: gradient descent without a gradient

Online convex optimization in the bandit setting: gradient descent without a gradient

February 1, 2008 | Abraham D. Flaxman, Adam Tauman Kalai, H. Brendan McMahan
This paper presents a method for online convex optimization in the bandit setting, where the gradient of the cost function is not directly accessible. The approach uses a single random sample to approximate the gradient, enabling gradient descent without explicit gradient computation. The algorithm achieves an expected regret bound of O(n^{5/6}) against an oblivious adversary, which is worse than the O(√n) bound in the full information setting but better than other bandit algorithms. The method is based on estimating the gradient using a random unit vector and evaluating the function at a single point. This approximation is shown to be sufficient for gradient descent in the bandit setting. The paper also discusses related work, including online linear optimization and the use of isotropic transformations to simplify the feasible set. The algorithm is analyzed in detail, and the regret bounds are derived under different assumptions, including Lipschitz continuity. The results show that the algorithm performs well even in adversarial settings, and the paper concludes with implications for future research in online optimization.This paper presents a method for online convex optimization in the bandit setting, where the gradient of the cost function is not directly accessible. The approach uses a single random sample to approximate the gradient, enabling gradient descent without explicit gradient computation. The algorithm achieves an expected regret bound of O(n^{5/6}) against an oblivious adversary, which is worse than the O(√n) bound in the full information setting but better than other bandit algorithms. The method is based on estimating the gradient using a random unit vector and evaluating the function at a single point. This approximation is shown to be sufficient for gradient descent in the bandit setting. The paper also discusses related work, including online linear optimization and the use of isotropic transformations to simplify the feasible set. The algorithm is analyzed in detail, and the regret bounds are derived under different assumptions, including Lipschitz continuity. The results show that the algorithm performs well even in adversarial settings, and the paper concludes with implications for future research in online optimization.
Reach us at info@study.space
Understanding Online convex optimization in the bandit setting%3A gradient descent without a gradient