Understanding On the Saturation Effect of Kernel Ridge Regression

The paper addresses the saturation effect in kernel ridge regression (KRR), a widely used non-parametric regression method. The saturation effect refers to the phenomenon where KRR fails to achieve the information-theoretical lower bound on the generalization error when the smoothness of the underlying function exceeds a certain level. This effect has been observed in practice for decades, and a conjecture about the saturation lower bound of KRR has been proposed. The main contribution of this paper is to provide a rigorous proof of this conjecture. The authors first review the basics of KRR and the concept of saturation effect, including the conditions under which KRR can achieve the information-theoretical lower bound and the conditions under which it fails to do so. They then introduce two additional assumptions: one on the kernel and another on the noise. Based on these assumptions, they prove that for any choice of regularization parameter that tends to zero, the generalization error of KRR cannot be better than \( n^{-\frac{2}{2+\beta}} \), where \( \beta \) is a quantity characterizing the RKHS. This result rigorously confirms the long-standing conjecture about the saturation effect in KRR. The proof involves a bias-variance decomposition, where the authors show that the bias term is at least \( c \lambda^2 \) and the variance term is at least \( \frac{\sigma^2}{n} \int_X \|(T_X + \lambda)^{-1} k(x, \cdot)\|_{L^2}^2 \). The bias term is shown to be lower bounded by a constant times \( \lambda^2 \), while the variance term is lower bounded by a constant times \( \frac{\lambda^{-\beta}}{n} \). The paper also includes numerical experiments that confirm the theoretical findings, showing that the convergence rate of KRR indeed matches the theoretical lower bound when the regression function is sufficiently smooth. The results suggest that KRR may be outperformed by other spectral regularization algorithms, such as spectral cut-off and kernel gradient descent, which can achieve optimal rates without saturating. In conclusion, the paper provides a comprehensive proof of the saturation effect in KRR, highlighting the limitations of KRR in handling highly smooth functions and suggesting that alternative spectral regularization methods may be more effective in such scenarios.The paper addresses the saturation effect in kernel ridge regression (KRR), a widely used non-parametric regression method. The saturation effect refers to the phenomenon where KRR fails to achieve the information-theoretical lower bound on the generalization error when the smoothness of the underlying function exceeds a certain level. This effect has been observed in practice for decades, and a conjecture about the saturation lower bound of KRR has been proposed. The main contribution of this paper is to provide a rigorous proof of this conjecture. The authors first review the basics of KRR and the concept of saturation effect, including the conditions under which KRR can achieve the information-theoretical lower bound and the conditions under which it fails to do so. They then introduce two additional assumptions: one on the kernel and another on the noise. Based on these assumptions, they prove that for any choice of regularization parameter that tends to zero, the generalization error of KRR cannot be better than \( n^{-\frac{2}{2+\beta}} \), where \( \beta \) is a quantity characterizing the RKHS. This result rigorously confirms the long-standing conjecture about the saturation effect in KRR. The proof involves a bias-variance decomposition, where the authors show that the bias term is at least \( c \lambda^2 \) and the variance term is at least \( \frac{\sigma^2}{n} \int_X \|(T_X + \lambda)^{-1} k(x, \cdot)\|_{L^2}^2 \). The bias term is shown to be lower bounded by a constant times \( \lambda^2 \), while the variance term is lower bounded by a constant times \( \frac{\lambda^{-\beta}}{n} \). The paper also includes numerical experiments that confirm the theoretical findings, showing that the convergence rate of KRR indeed matches the theoretical lower bound when the regression function is sufficiently smooth. The results suggest that KRR may be outperformed by other spectral regularization algorithms, such as spectral cut-off and kernel gradient descent, which can achieve optimal rates without saturating. In conclusion, the paper provides a comprehensive proof of the saturation effect in KRR, highlighting the limitations of KRR in handling highly smooth functions and suggesting that alternative spectral regularization methods may be more effective in such scenarios.

ON THE SATURATION EFFECT OF KERNEL RIDGE REGRESSION

28 May 2024 | Yicheng Li & Haobo Zhang, Qian Lin