On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models

16 (2016) 1-42 | Emilie Kaufmann, Olivier Cappé, Aurélien Garivier
This paper investigates the complexity of identifying the $m$ best arms in a stochastic multi-armed bandit model. The authors introduce generic notions of complexity for two dominant frameworks: fixed-budget and fixed-confidence settings. In the fixed-confidence setting, they provide the first known distribution-dependent lower bound on the complexity, which involves information-theoretic quantities and holds for $m \geq 1$. For two-armed bandit models, they derive refined lower bounds in both settings, along with matching algorithms for Gaussian and Bernoulli bandit models. The results show that the complexity of the fixed-budget setting can be smaller than that of the fixed-confidence setting, contradicting the familiar behavior observed in fully specified alternatives. Additionally, they provide improved sequential stopping rules with guaranteed error probabilities and shorter average running times. The proofs rely on two technical results: a deviation lemma for self-normalized sums and a novel change of measure inequality for bandit models.This paper investigates the complexity of identifying the $m$ best arms in a stochastic multi-armed bandit model. The authors introduce generic notions of complexity for two dominant frameworks: fixed-budget and fixed-confidence settings. In the fixed-confidence setting, they provide the first known distribution-dependent lower bound on the complexity, which involves information-theoretic quantities and holds for $m \geq 1$. For two-armed bandit models, they derive refined lower bounds in both settings, along with matching algorithms for Gaussian and Bernoulli bandit models. The results show that the complexity of the fixed-budget setting can be smaller than that of the fixed-confidence setting, contradicting the familiar behavior observed in fully specified alternatives. Additionally, they provide improved sequential stopping rules with guaranteed error probabilities and shorter average running times. The proofs rely on two technical results: a deviation lemma for self-normalized sums and a novel change of measure inequality for bandit models.
Reach us at info@study.space
Understanding On the Complexity of Best-Arm Identification in Multi-Armed Bandit Models