Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions

Repetita Iuvant: Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions

2025-02-10 | Luca Arnaboldi¹, Yatin Dandi¹,², Florent Krzakala¹, Luca Pesci¹, and Ludovic Stephan³
This paper investigates the training dynamics of two-layer shallow neural networks using gradient-based algorithms, focusing on how they learn relevant features in multi-index models. The study shows that data repetition significantly improves the computational efficiency of Stochastic Gradient Descent (SGD) in learning high-dimensional multi-index functions. In the high-dimensional regime, where the input dimension d diverges, a simple modification of the idealized single-pass gradient descent training scenario—where data can be repeated or iterated upon twice—drastically improves its computational efficiency. This approach surpasses the limitations previously believed to be dictated by the Information and Leap exponents associated with the target function to be learned. The results highlight the ability of networks to learn relevant structures from data alone without any pre-processing. Most multi-index functions are learned with total sample complexity either O(d) or O(d log d) if they are related to the presence of a symmetry. The paper also shows that data repetition allows for the learning of hard functions, such as sparse parities, through a hierarchical mechanism that generalizes the notion of staircase functions. The findings are proven by a rigorous study of the evolution of the relevant statistics for high-dimensional dynamics. The paper discusses the extension of the generative exponent to the multi-index setting and investigates the dynamics of specific gradient-based programs. These theoretical findings match recent results on the computational efficiency of SGD in learning sparse k-parity. The paper also shows that a simple gradient-based algorithm, which only requires seeing the data twice, is optimal for learning any polynomials. The results demonstrate that shallow neural networks are significantly more powerful than previously believed in identifying relevant features in the data. The study provides a detailed analysis of the learning dynamics of two-layer networks and shows that data repetition can significantly improve the efficiency of SGD in learning high-dimensional multi-index functions.This paper investigates the training dynamics of two-layer shallow neural networks using gradient-based algorithms, focusing on how they learn relevant features in multi-index models. The study shows that data repetition significantly improves the computational efficiency of Stochastic Gradient Descent (SGD) in learning high-dimensional multi-index functions. In the high-dimensional regime, where the input dimension d diverges, a simple modification of the idealized single-pass gradient descent training scenario—where data can be repeated or iterated upon twice—drastically improves its computational efficiency. This approach surpasses the limitations previously believed to be dictated by the Information and Leap exponents associated with the target function to be learned. The results highlight the ability of networks to learn relevant structures from data alone without any pre-processing. Most multi-index functions are learned with total sample complexity either O(d) or O(d log d) if they are related to the presence of a symmetry. The paper also shows that data repetition allows for the learning of hard functions, such as sparse parities, through a hierarchical mechanism that generalizes the notion of staircase functions. The findings are proven by a rigorous study of the evolution of the relevant statistics for high-dimensional dynamics. The paper discusses the extension of the generative exponent to the multi-index setting and investigates the dynamics of specific gradient-based programs. These theoretical findings match recent results on the computational efficiency of SGD in learning sparse k-parity. The paper also shows that a simple gradient-based algorithm, which only requires seeing the data twice, is optimal for learning any polynomials. The results demonstrate that shallow neural networks are significantly more powerful than previously believed in identifying relevant features in the data. The study provides a detailed analysis of the learning dynamics of two-layer networks and shows that data repetition can significantly improve the efficiency of SGD in learning high-dimensional multi-index functions.
Reach us at info@study.space
Understanding Repetita Iuvant%3A Data Repetition Allows SGD to Learn High-Dimensional Multi-Index Functions