Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

Efficient Frameworks for Generalized Low-Rank Matrix Bandit Problems

14 Jan 2024 | Yue Kang, Cho-Jui Hsieh, Thomas C. M. Lee
This paper addresses the generalized low-rank matrix bandit problem, where the expected reward of an action is the inner product between the action's feature matrix and an unknown low-rank matrix. The authors propose two efficient frameworks, G-ESTT and G-ESTS, to tackle this problem. G-ESTT uses Stein's method for subspace estimation and then leverages the estimated subspaces to transform the problem into a sparse generalized linear bandit. G-ESTS, on the other hand, excludes the negligible entries of the estimated subspaces, reducing the dimensionality of the problem. Both frameworks achieve regret bounds that are competitive with existing methods, with G-ESTT achieving a regret bound of \(\tilde{O}((d_1 + d_2)^{3/2} r \sqrt{rT/D_{rr}})\) and G-ESTS achieving a regret bound of \(\tilde{O}(\sqrt{(d_1 + d_2)^{7/4} T^{3/4}})\). Experimental results validate the effectiveness of these algorithms, showing that they outperform existing methods in terms of regret and computational efficiency.This paper addresses the generalized low-rank matrix bandit problem, where the expected reward of an action is the inner product between the action's feature matrix and an unknown low-rank matrix. The authors propose two efficient frameworks, G-ESTT and G-ESTS, to tackle this problem. G-ESTT uses Stein's method for subspace estimation and then leverages the estimated subspaces to transform the problem into a sparse generalized linear bandit. G-ESTS, on the other hand, excludes the negligible entries of the estimated subspaces, reducing the dimensionality of the problem. Both frameworks achieve regret bounds that are competitive with existing methods, with G-ESTT achieving a regret bound of \(\tilde{O}((d_1 + d_2)^{3/2} r \sqrt{rT/D_{rr}})\) and G-ESTS achieving a regret bound of \(\tilde{O}(\sqrt{(d_1 + d_2)^{7/4} T^{3/4}})\). Experimental results validate the effectiveness of these algorithms, showing that they outperform existing methods in terms of regret and computational efficiency.
Reach us at info@study.space