6 Mar 2015 | Rong Ge*, Furong Huang†, Chi Jin‡, Yang Yuan §
This paper analyzes the convergence of stochastic gradient descent (SGD) for optimizing non-convex functions, focusing on the challenge of finding local minima in the presence of saddle points. The authors introduce the concept of *strict saddle* properties, which allow for efficient optimization. Using this property, they show that SGD can converge to a local minimum in a polynomial number of iterations for non-convex functions with exponentially many local minima and saddle points. This is the first work to provide global convergence guarantees for SGD in such settings.
The analysis is applied to orthogonal tensor decomposition, a problem common in learning latent variable models. The authors propose a new optimization formulation for tensor decomposition that satisfies the strict saddle property. This leads to the first online algorithm for orthogonal tensor decomposition with global convergence guarantees. The paper includes theoretical results and empirical experiments demonstrating the effectiveness of the proposed method.This paper analyzes the convergence of stochastic gradient descent (SGD) for optimizing non-convex functions, focusing on the challenge of finding local minima in the presence of saddle points. The authors introduce the concept of *strict saddle* properties, which allow for efficient optimization. Using this property, they show that SGD can converge to a local minimum in a polynomial number of iterations for non-convex functions with exponentially many local minima and saddle points. This is the first work to provide global convergence guarantees for SGD in such settings.
The analysis is applied to orthogonal tensor decomposition, a problem common in learning latent variable models. The authors propose a new optimization formulation for tensor decomposition that satisfies the strict saddle property. This leads to the first online algorithm for orthogonal tensor decomposition with global convergence guarantees. The paper includes theoretical results and empirical experiments demonstrating the effectiveness of the proposed method.