14 Jan 2024 | Yue Kang*, Cho-Jui Hsieh†, Thomas C. M. Lee†
This paper introduces two efficient frameworks, G-ESTT and G-ESTS, for generalized low-rank matrix bandit problems. The problem involves selecting actions represented by matrices to maximize cumulative reward, where the reward is determined by the inner product of the action matrix with an unknown low-rank matrix Θ*. The authors propose G-ESTT, which modifies the ESTR framework using Stein's method and regularization, and G-ESTS, which improves efficiency by excluding negligible entries in the estimated subspace. Both algorithms achieve regret bounds of $\tilde{O}(\sqrt{(d_1 + d_2)M r T})$ and $\tilde{O}(\sqrt{(d_1 + d_2)^{3/2} M r^{3/2} T})$, respectively, under mild assumptions. The experiments show that G-ESTS outperforms existing methods in terms of computational efficiency and regret. The proposed frameworks are computationally feasible and achieve good performance in both theoretical and practical settings. The work contributes to the field of bandit algorithms by providing efficient solutions for low-rank matrix bandit problems.This paper introduces two efficient frameworks, G-ESTT and G-ESTS, for generalized low-rank matrix bandit problems. The problem involves selecting actions represented by matrices to maximize cumulative reward, where the reward is determined by the inner product of the action matrix with an unknown low-rank matrix Θ*. The authors propose G-ESTT, which modifies the ESTR framework using Stein's method and regularization, and G-ESTS, which improves efficiency by excluding negligible entries in the estimated subspace. Both algorithms achieve regret bounds of $\tilde{O}(\sqrt{(d_1 + d_2)M r T})$ and $\tilde{O}(\sqrt{(d_1 + d_2)^{3/2} M r^{3/2} T})$, respectively, under mild assumptions. The experiments show that G-ESTS outperforms existing methods in terms of computational efficiency and regret. The proposed frameworks are computationally feasible and achieve good performance in both theoretical and practical settings. The work contributes to the field of bandit algorithms by providing efficient solutions for low-rank matrix bandit problems.