13 Mar 2024 | Hong Hu, Yue M. Lu, Theodor Misiakiewicz
This paper investigates the asymptotics of random feature ridge regression (RFRR) in the high-dimensional polynomial scaling regime, where the number of parameters \( p \), sample size \( n \), and dimension \( d \) all diverge, but \( p/d^{\kappa_1} \) and \( n/d^{\kappa_2} \) remain constant for \( \kappa_1, \kappa_2 \in \mathbb{R}_{>0} \). RFRR is a simplified model for neural networks trained in the lazy regime or a finite-rank approximation to kernel ridge regression (KRR). The authors compute sharp asymptotics for the RFRR test error, characterizing the impact of the number of random features and regularization parameter on test performance. They find that RFRR exhibits a trade-off between approximation and generalization power. Specifically, if \( n = o(p) \), the sample size is the bottleneck, and RFRR achieves the same performance as KRR. Conversely, if \( p = o(n) \), the number of random features is the limiting factor, and RFRR test error matches the approximation error of the random feature model class. A double descent phenomenon is observed at \( n = p \), where the test error first decreases, peaks, and then decreases again. This completes the understanding of RFRR's performance in the high-dimensional regime, providing insights into the interplay between approximation and generalization in overparametrized models.This paper investigates the asymptotics of random feature ridge regression (RFRR) in the high-dimensional polynomial scaling regime, where the number of parameters \( p \), sample size \( n \), and dimension \( d \) all diverge, but \( p/d^{\kappa_1} \) and \( n/d^{\kappa_2} \) remain constant for \( \kappa_1, \kappa_2 \in \mathbb{R}_{>0} \). RFRR is a simplified model for neural networks trained in the lazy regime or a finite-rank approximation to kernel ridge regression (KRR). The authors compute sharp asymptotics for the RFRR test error, characterizing the impact of the number of random features and regularization parameter on test performance. They find that RFRR exhibits a trade-off between approximation and generalization power. Specifically, if \( n = o(p) \), the sample size is the bottleneck, and RFRR achieves the same performance as KRR. Conversely, if \( p = o(n) \), the number of random features is the limiting factor, and RFRR test error matches the approximation error of the random feature model class. A double descent phenomenon is observed at \( n = p \), where the test error first decreases, peaks, and then decreases again. This completes the understanding of RFRR's performance in the high-dimensional regime, providing insights into the interplay between approximation and generalization in overparametrized models.