[slides] Error bounds for approximations with deep ReLU networks

This paper investigates the expressive power of shallow and deep neural networks with piece-wise linear activation functions, specifically ReLU (Rectified Linear Unit). The study focuses on the approximation of functions in Sobolev spaces, providing rigorous upper and lower bounds on the network complexity required for accurate approximation. Key findings include: 1. **Upper Bounds**: - **Deep ReLU Networks for Smooth Functions**: Deep ReLU networks are more efficient for approximating smooth functions compared to shallow networks. For example, the function \( f(x) = x^2 \) can be approximated with \( O(\ln(1/\epsilon)) \) depth and complexity. - **General Smooth Functions**: A depth-6 ReLU network with \( O(\frac{1}{\epsilon \ln(1/\epsilon)}) \) connections and computation units can approximate any function in \( \mathcal{W}^{m,\infty}([0, 1]^d) \) with error \( \epsilon \). - **Adaptive Architectures**: For Lipschitz functions, an adaptive depth-6 network architecture can achieve the same approximation with \( O(\frac{1}{\epsilon \ln(1/\epsilon)}) \) complexity. 2. **Lower Bounds**: - **Continuous Nonlinear Widths**: The complexity required for approximation is at least \( \sim \epsilon^{-d/n} \) connections and weights, which is a lower bound provided by the method of continuous nonlinear widths. - **VC-Dimension and Function Spaces**: Lower bounds are also derived using the VC-dimension and functional spaces constructed using deep dependency graphs. The paper discusses the implications of these results, highlighting the advantages of deep networks over shallow ones in terms of efficiency and expressiveness. The findings contribute to the theoretical understanding of deep neural networks and their practical applications.This paper investigates the expressive power of shallow and deep neural networks with piece-wise linear activation functions, specifically ReLU (Rectified Linear Unit). The study focuses on the approximation of functions in Sobolev spaces, providing rigorous upper and lower bounds on the network complexity required for accurate approximation. Key findings include: 1. **Upper Bounds**: - **Deep ReLU Networks for Smooth Functions**: Deep ReLU networks are more efficient for approximating smooth functions compared to shallow networks. For example, the function \( f(x) = x^2 \) can be approximated with \( O(\ln(1/\epsilon)) \) depth and complexity. - **General Smooth Functions**: A depth-6 ReLU network with \( O(\frac{1}{\epsilon \ln(1/\epsilon)}) \) connections and computation units can approximate any function in \( \mathcal{W}^{m,\infty}([0, 1]^d) \) with error \( \epsilon \). - **Adaptive Architectures**: For Lipschitz functions, an adaptive depth-6 network architecture can achieve the same approximation with \( O(\frac{1}{\epsilon \ln(1/\epsilon)}) \) complexity. 2. **Lower Bounds**: - **Continuous Nonlinear Widths**: The complexity required for approximation is at least \( \sim \epsilon^{-d/n} \) connections and weights, which is a lower bound provided by the method of continuous nonlinear widths. - **VC-Dimension and Function Spaces**: Lower bounds are also derived using the VC-dimension and functional spaces constructed using deep dependency graphs. The paper discusses the implications of these results, highlighting the advantages of deep networks over shallow ones in terms of efficiency and expressiveness. The findings contribute to the theoretical understanding of deep neural networks and their practical applications.

Error bounds for approximations with deep ReLU networks

May 2, 2017 | Dmitry Yarotsky