[slides] Minimizing finite sums with the stochastic average gradient

The paper "Minimizing Finite Sums with the Stochastic Average Gradient" by Mark Schmidt, Nicolas Le Roux, and Francis Bach analyzes the Stochastic Average Gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Unlike traditional stochastic gradient (SG) methods, which have an iteration cost independent of the number of terms in the sum, SAG incorporates a memory of previous gradient values, achieving a faster convergence rate. The convergence rate is improved from \(O(1/\sqrt{k})\) to \(O(1/k)\) for general convex functions and to a linear convergence rate \(O(\rho^k)\) for strongly convex functions, where \(\rho < 1\). The authors also show that SAG can outperform both black-box SG and deterministic gradient methods in terms of the number of gradient evaluations. The paper includes theoretical analysis, practical implementation details, and numerical experiments demonstrating the effectiveness of SAG, particularly in scenarios where the number of data points is large and there is redundancy among examples.The paper "Minimizing Finite Sums with the Stochastic Average Gradient" by Mark Schmidt, Nicolas Le Roux, and Francis Bach analyzes the Stochastic Average Gradient (SAG) method for optimizing the sum of a finite number of smooth convex functions. Unlike traditional stochastic gradient (SG) methods, which have an iteration cost independent of the number of terms in the sum, SAG incorporates a memory of previous gradient values, achieving a faster convergence rate. The convergence rate is improved from \(O(1/\sqrt{k})\) to \(O(1/k)\) for general convex functions and to a linear convergence rate \(O(\rho^k)\) for strongly convex functions, where \(\rho < 1\). The authors also show that SAG can outperform both black-box SG and deterministic gradient methods in terms of the number of gradient evaluations. The paper includes theoretical analysis, practical implementation details, and numerical experiments demonstrating the effectiveness of SAG, particularly in scenarios where the number of data points is large and there is redundancy among examples.

Minimizing Finite Sums with the Stochastic Average Gradient

2017 | Mark Schmidt, Nicolas Le Roux, Francis Bach