Understanding Scaling and renormalization in high-dimensional regression

This paper presents a concise derivation of the training and generalization performance of high-dimensional ridge regression models using random matrix theory and free probability. The authors review recent results on these topics, providing an accessible introduction for readers with backgrounds in physics and deep learning. They derive analytic formulas for training and generalization errors using the S-transform of free probability, identifying the sources of power-law scaling in model performance. The paper computes the generalization error of a broad class of random feature models, showing that the S-transform corresponds to the train-test generalization gap and yields an analogue of the generalized-cross-validation estimator. The authors also derive fine-grained bias-variance decompositions for a general class of random feature models with structured covariates, uncovering a scaling regime where the training and test errors are well-behaved in the overparameterized setting. Additionally, they demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections. The results extend and unify earlier models of neural scaling laws.This paper presents a concise derivation of the training and generalization performance of high-dimensional ridge regression models using random matrix theory and free probability. The authors review recent results on these topics, providing an accessible introduction for readers with backgrounds in physics and deep learning. They derive analytic formulas for training and generalization errors using the S-transform of free probability, identifying the sources of power-law scaling in model performance. The paper computes the generalization error of a broad class of random feature models, showing that the S-transform corresponds to the train-test generalization gap and yields an analogue of the generalized-cross-validation estimator. The authors also derive fine-grained bias-variance decompositions for a general class of random feature models with structured covariates, uncovering a scaling regime where the training and test errors are well-behaved in the overparameterized setting. Additionally, they demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections. The results extend and unify earlier models of neural scaling laws.

Scaling and renormalization in high-dimensional regression

June 27, 2024 | Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan