Neural Redshift: Random Networks are not Random Functions

Neural Redshift: Random Networks are not Random Functions

29 Apr 2025 | Damien Teney, Armand Mihai Nicolicioiu, Valentin Hartmann, Ehsan Abbasnejad
Neural Redshift: Random Networks are not Random Functions This paper investigates the generalization capabilities of neural networks (NNs), challenging the prevailing view that gradient descent (GD) is the primary driver of their success. The authors argue that NNs have inherent inductive biases, independent of GD, which influence their ability to generalize. They examine untrained, random-weight networks and find that even simple MLPs exhibit strong inductive biases, favoring functions of low complexity. However, this "simplicity bias" depends on components like ReLUs, residual connections, and layer normalizations. Transformers also inherit these properties from their building blocks. The paper introduces the concept of "Neural Redshift (NRS)" to describe the bias in NNs towards low-frequency, low-order, and compressible functions. This bias is not dependent on optimization, scaling, or data distribution. The authors show that this bias can be modulated or avoided by changing the architecture or weight magnitudes. They also demonstrate that this bias is relevant to transformer sequence models, which generate simple, compressible sequences. The study shows that the simplicity bias is not universal and depends on specific components of the network. ReLU networks maintain their simplicity bias regardless of depth and weight magnitude, suggesting that ReLUs are more important than previously thought. The authors also show that the simplicity bias can be controlled to learn complex functions or mitigate shortcut learning. The paper provides a new explanation for the success of deep learning, independent of gradient-based training. It highlights the importance of architecture and weight space parametrization in determining the inductive biases of NNs. The findings suggest that the effectiveness of NNs is not an intrinsic property but the result of the adequacy between key choices (e.g., ReLUs) and properties of real-world data (prevalence of low-complexity patterns). The study also shows that the simplicity bias is not universal and can be controlled to learn complex functions or mitigate shortcut learning.Neural Redshift: Random Networks are not Random Functions This paper investigates the generalization capabilities of neural networks (NNs), challenging the prevailing view that gradient descent (GD) is the primary driver of their success. The authors argue that NNs have inherent inductive biases, independent of GD, which influence their ability to generalize. They examine untrained, random-weight networks and find that even simple MLPs exhibit strong inductive biases, favoring functions of low complexity. However, this "simplicity bias" depends on components like ReLUs, residual connections, and layer normalizations. Transformers also inherit these properties from their building blocks. The paper introduces the concept of "Neural Redshift (NRS)" to describe the bias in NNs towards low-frequency, low-order, and compressible functions. This bias is not dependent on optimization, scaling, or data distribution. The authors show that this bias can be modulated or avoided by changing the architecture or weight magnitudes. They also demonstrate that this bias is relevant to transformer sequence models, which generate simple, compressible sequences. The study shows that the simplicity bias is not universal and depends on specific components of the network. ReLU networks maintain their simplicity bias regardless of depth and weight magnitude, suggesting that ReLUs are more important than previously thought. The authors also show that the simplicity bias can be controlled to learn complex functions or mitigate shortcut learning. The paper provides a new explanation for the success of deep learning, independent of gradient-based training. It highlights the importance of architecture and weight space parametrization in determining the inductive biases of NNs. The findings suggest that the effectiveness of NNs is not an intrinsic property but the result of the adequacy between key choices (e.g., ReLUs) and properties of real-world data (prevalence of low-complexity patterns). The study also shows that the simplicity bias is not universal and can be controlled to learn complex functions or mitigate shortcut learning.
Reach us at info@study.space