On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

16 (2016) 1-42 | Emilie Kaufmann, Olivier Cappé, Aurélien Garivier
This paper investigates the complexity of identifying the m best arms in a stochastic multi-armed bandit model. The authors introduce generic notions of complexity for two dominant frameworks: fixed-budget and fixed-confidence settings. In the fixed-confidence setting, they provide the first known distribution-dependent lower bound on the complexity involving information-theoretic quantities. In the specific case of two-armed bandits, they derive refined lower bounds in both settings and match algorithms for Gaussian and Bernoulli bandit models. These results show that the complexity of the fixed-budget setting may be smaller than the fixed-confidence setting, contradicting the familiar behavior observed in testing fully specified alternatives. They also provide improved sequential stopping rules with guaranteed error probabilities and shorter average running times. The proofs rely on two technical results: a deviation lemma for self-normalized sums and a novel change of measure inequality for bandit models. The paper also compares the complexities of best-arm identification in fixed-confidence and fixed-budget settings, showing that for two-armed Gaussian bandits, the complexities are equal, while for Bernoulli bandits, the fixed-confidence complexity is larger. The authors propose matching algorithms for these settings, including an optimal algorithm for Gaussian bandits with known variances and a near-optimal algorithm for Bernoulli bandits using uniform sampling. The paper also provides theoretical guarantees for these algorithms and discusses their performance in numerical experiments.This paper investigates the complexity of identifying the m best arms in a stochastic multi-armed bandit model. The authors introduce generic notions of complexity for two dominant frameworks: fixed-budget and fixed-confidence settings. In the fixed-confidence setting, they provide the first known distribution-dependent lower bound on the complexity involving information-theoretic quantities. In the specific case of two-armed bandits, they derive refined lower bounds in both settings and match algorithms for Gaussian and Bernoulli bandit models. These results show that the complexity of the fixed-budget setting may be smaller than the fixed-confidence setting, contradicting the familiar behavior observed in testing fully specified alternatives. They also provide improved sequential stopping rules with guaranteed error probabilities and shorter average running times. The proofs rely on two technical results: a deviation lemma for self-normalized sums and a novel change of measure inequality for bandit models. The paper also compares the complexities of best-arm identification in fixed-confidence and fixed-budget settings, showing that for two-armed Gaussian bandits, the complexities are equal, while for Bernoulli bandits, the fixed-confidence complexity is larger. The authors propose matching algorithms for these settings, including an optimal algorithm for Gaussian bandits with known variances and a near-optimal algorithm for Bernoulli bandits using uniform sampling. The paper also provides theoretical guarantees for these algorithms and discusses their performance in numerical experiments.
Reach us at info@study.space
Understanding On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models