Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

Get rich quick: exact solutions reveal how unbalanced initializations promote rapid feature learning

10 Jun 2024 | Daniel Kunin, Allan Raventós, Clémentine Domine, Feng Chen, David Klindt, Andrew Saxe, Surya Ganguli
This paper explores how unbalanced layer-specific initialization variances and learning rates influence the transition between lazy and rich learning regimes in neural networks. The authors derive exact solutions for a minimal model that transitions between these regimes, revealing how these factors determine the degree of feature learning. They show that unbalanced initializations can lead to rapid feature learning, especially in nonlinear networks, by promoting faster learning in earlier layers. The analysis extends to more complex linear and nonlinear models, demonstrating that unbalanced initializations can accelerate rich learning and improve generalization. The study also shows that upstream initializations drive feature learning in deep networks, enhance interpretability of early layers in CNNs, reduce sample complexity for hierarchical data, and decrease the time to grokking in modular arithmetic. The paper highlights the importance of unbalanced initializations in efficient feature learning and provides theoretical insights into the dynamics of neural networks. The authors conclude that further research into unbalanced initializations is needed to optimize feature learning in deep networks.This paper explores how unbalanced layer-specific initialization variances and learning rates influence the transition between lazy and rich learning regimes in neural networks. The authors derive exact solutions for a minimal model that transitions between these regimes, revealing how these factors determine the degree of feature learning. They show that unbalanced initializations can lead to rapid feature learning, especially in nonlinear networks, by promoting faster learning in earlier layers. The analysis extends to more complex linear and nonlinear models, demonstrating that unbalanced initializations can accelerate rich learning and improve generalization. The study also shows that upstream initializations drive feature learning in deep networks, enhance interpretability of early layers in CNNs, reduce sample complexity for hierarchical data, and decrease the time to grokking in modular arithmetic. The paper highlights the importance of unbalanced initializations in efficient feature learning and provides theoretical insights into the dynamics of neural networks. The authors conclude that further research into unbalanced initializations is needed to optimize feature learning in deep networks.
Reach us at info@study.space