Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

June 4, 2024 | Jason D. Lee, Kazusato Oko, Taiji Suzuki, Denny Wu
This paper studies the learning of single-index target functions using gradient descent with stochastic gradient descent (SGD). The target function is defined as $ f_{*}(\mathbf{x}) = \sigma_{*}(\langle \mathbf{x}, \boldsymbol{\theta} \rangle) $, where $ \sigma_{*} $ is an unknown degree-q polynomial with information exponent p. The study focuses on learning this function under isotropic Gaussian data in $ \mathbb{R}^d $. The paper shows that a two-layer neural network, when trained with SGD and using a reused batch, can learn the target function with a sample and runtime complexity of $ n \asymp T \asymp C(q) \cdot d \operatorname{polylog} d $, where $ C(q) $ depends only on the degree of $ \sigma_{*} $. This matches the information-theoretic limit up to polylogarithmic factors. The key insight is the reuse of minibatches in gradient computation, which introduces higher-order information beyond correlational queries. The paper also addresses the challenge of achieving statistical guarantees for learning single-index models. It shows that SGD with reused data can reduce the information exponent of the target function, enabling efficient learning. The analysis demonstrates that with appropriate activation functions and layer-wise optimization, SGD can achieve weak recovery of the index features and eventually strong recovery, leading to small generalization error. The study establishes that two-layer neural networks trained with SGD and reused batches can achieve near-optimal sample complexity for learning single-index polynomials, regardless of the information exponent. This result is supported by theoretical analysis and empirical evidence, showing that SGD with reused data can overcome the "curse of information exponent" and achieve performance close to the information-theoretic limit. The paper also discusses the implications of these findings for future research in deep learning and statistical learning theory.This paper studies the learning of single-index target functions using gradient descent with stochastic gradient descent (SGD). The target function is defined as $ f_{*}(\mathbf{x}) = \sigma_{*}(\langle \mathbf{x}, \boldsymbol{\theta} \rangle) $, where $ \sigma_{*} $ is an unknown degree-q polynomial with information exponent p. The study focuses on learning this function under isotropic Gaussian data in $ \mathbb{R}^d $. The paper shows that a two-layer neural network, when trained with SGD and using a reused batch, can learn the target function with a sample and runtime complexity of $ n \asymp T \asymp C(q) \cdot d \operatorname{polylog} d $, where $ C(q) $ depends only on the degree of $ \sigma_{*} $. This matches the information-theoretic limit up to polylogarithmic factors. The key insight is the reuse of minibatches in gradient computation, which introduces higher-order information beyond correlational queries. The paper also addresses the challenge of achieving statistical guarantees for learning single-index models. It shows that SGD with reused data can reduce the information exponent of the target function, enabling efficient learning. The analysis demonstrates that with appropriate activation functions and layer-wise optimization, SGD can achieve weak recovery of the index features and eventually strong recovery, leading to small generalization error. The study establishes that two-layer neural networks trained with SGD and reused batches can achieve near-optimal sample complexity for learning single-index polynomials, regardless of the information exponent. This result is supported by theoretical analysis and empirical evidence, showing that SGD with reused data can overcome the "curse of information exponent" and achieve performance close to the information-theoretic limit. The paper also discusses the implications of these findings for future research in deep learning and statistical learning theory.
Reach us at info@study.space