Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

10 Jun 2014 | Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio
This paper argues that in high-dimensional non-convex optimization, saddle points are a more significant challenge than local minima, which are often thought to be the main obstacle for optimization algorithms. Based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, the authors show that saddle points are exponentially more common than local minima in high-dimensional problems. These saddle points are surrounded by high error plateaus that can slow down learning and create the illusion of local minima. To address this issue, the authors propose a new second-order optimization method called the saddle-free Newton method, which can rapidly escape from saddle points, unlike gradient descent or quasi-Newton methods. The method leverages second-order curvature information in a fundamentally different way than quasi-Newton methods and outperforms them in some high-dimensional problems involving deep or recurrent networks. The authors also provide experimental validation of their theory by training neural networks and showing that the saddle-free Newton method outperforms other optimization methods in terms of convergence speed and error reduction. The results suggest that the saddle-free Newton method is a promising approach for optimizing high-dimensional non-convex functions.This paper argues that in high-dimensional non-convex optimization, saddle points are a more significant challenge than local minima, which are often thought to be the main obstacle for optimization algorithms. Based on results from statistical physics, random matrix theory, neural network theory, and empirical evidence, the authors show that saddle points are exponentially more common than local minima in high-dimensional problems. These saddle points are surrounded by high error plateaus that can slow down learning and create the illusion of local minima. To address this issue, the authors propose a new second-order optimization method called the saddle-free Newton method, which can rapidly escape from saddle points, unlike gradient descent or quasi-Newton methods. The method leverages second-order curvature information in a fundamentally different way than quasi-Newton methods and outperforms them in some high-dimensional problems involving deep or recurrent networks. The authors also provide experimental validation of their theory by training neural networks and showing that the saddle-free Newton method outperforms other optimization methods in terms of convergence speed and error reduction. The results suggest that the saddle-free Newton method is a promising approach for optimizing high-dimensional non-convex functions.
Reach us at info@study.space
[slides] Identifying and attacking the saddle point problem in high-dimensional non-convex optimization | StudySpace