Asymptotics of Random Feature Regression Beyond the Linear Scaling Regime

Asymptotics of Random Feature Regression Beyond the Linear Scaling Regime

13 Mar 2024 | Hong Hu, Yue M. Lu, Theodor Misiakiewicz
This paper investigates the asymptotic behavior of random feature ridge regression (RFRR) in the high-dimensional polynomial scaling regime, where the number of parameters $ p $, the number of samples $ n $, and the dimension $ d $ all grow to infinity. The study focuses on understanding how the test error of RFRR depends on the number of parameters $ p $ and the regularization parameter $ \lambda $, and how to choose $ p $ relative to $ n $ to achieve optimal test error. RFRR is shown to be a finite-rank approximation of kernel ridge regression (KRR) and a simplified model for neural networks trained in the lazy regime. The analysis considers data uniformly distributed on the $ d $-dimensional sphere $ \mathbb{S}^{d-1}(\sqrt{d}) $, and derives sharp asymptotics for the RFRR test error in the high-dimensional polynomial scaling, where $ p, n, d \to \infty $ while $ p/d^{\kappa_1} $ and $ n/d^{\kappa_2} $ remain constant for all $ \kappa_1, \kappa_2 \in \mathbb{R}_{>0} $. The results show that RFRR exhibits a trade-off between approximation and generalization power. When $ n = o(p) $, the sample size is the bottleneck, and RFRR achieves the same performance as KRR (equivalent to $ p = \infty $). When $ p = o(n) $, the number of random features is the limiting factor, and RFRR test error matches the approximation error of the random feature model class (akin to $ n = \infty $). A double descent phenomenon is observed at $ n = p $, which was previously characterized only in the linear scaling regime $ \kappa_1 = \kappa_2 = 1 $. The paper also shows that RFRR test error can be characterized as the maximum of the KRR test error and the approximation error. This provides a simple trade-off between approximation and statistical errors. The results are further extended to the polynomial scaling regime, where the test error depends on the target function, activation function, and the scalings $ \kappa_1, \kappa_2, \theta_1, \theta_2 $. The analysis reveals that the test error of RFRR can be non-monotonic with respect to $ n $ or $ p $ in certain regimes, and that the optimal regularization parameter depends on the target function and the scaling parameters. The paper also shows that the asymptotic risk of RFRR in the polynomial scaling is equivalent to that of a simpler Gaussian covariate model, which provides a useful insight into the behavior of RFRR in high-dimensional settings.This paper investigates the asymptotic behavior of random feature ridge regression (RFRR) in the high-dimensional polynomial scaling regime, where the number of parameters $ p $, the number of samples $ n $, and the dimension $ d $ all grow to infinity. The study focuses on understanding how the test error of RFRR depends on the number of parameters $ p $ and the regularization parameter $ \lambda $, and how to choose $ p $ relative to $ n $ to achieve optimal test error. RFRR is shown to be a finite-rank approximation of kernel ridge regression (KRR) and a simplified model for neural networks trained in the lazy regime. The analysis considers data uniformly distributed on the $ d $-dimensional sphere $ \mathbb{S}^{d-1}(\sqrt{d}) $, and derives sharp asymptotics for the RFRR test error in the high-dimensional polynomial scaling, where $ p, n, d \to \infty $ while $ p/d^{\kappa_1} $ and $ n/d^{\kappa_2} $ remain constant for all $ \kappa_1, \kappa_2 \in \mathbb{R}_{>0} $. The results show that RFRR exhibits a trade-off between approximation and generalization power. When $ n = o(p) $, the sample size is the bottleneck, and RFRR achieves the same performance as KRR (equivalent to $ p = \infty $). When $ p = o(n) $, the number of random features is the limiting factor, and RFRR test error matches the approximation error of the random feature model class (akin to $ n = \infty $). A double descent phenomenon is observed at $ n = p $, which was previously characterized only in the linear scaling regime $ \kappa_1 = \kappa_2 = 1 $. The paper also shows that RFRR test error can be characterized as the maximum of the KRR test error and the approximation error. This provides a simple trade-off between approximation and statistical errors. The results are further extended to the polynomial scaling regime, where the test error depends on the target function, activation function, and the scalings $ \kappa_1, \kappa_2, \theta_1, \theta_2 $. The analysis reveals that the test error of RFRR can be non-monotonic with respect to $ n $ or $ p $ in certain regimes, and that the optimal regularization parameter depends on the target function and the scaling parameters. The paper also shows that the asymptotic risk of RFRR in the polynomial scaling is equivalent to that of a simpler Gaussian covariate model, which provides a useful insight into the behavior of RFRR in high-dimensional settings.
Reach us at info@study.space
[slides and audio] Asymptotics of Random Feature Regression Beyond the Linear Scaling Regime