Understanding Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

The paper addresses the challenge of minimizing non-convex error functions in high-dimensional spaces, a common issue in various scientific and engineering fields. Traditional methods like gradient descent and quasi-Newton algorithms often struggle due to the proliferation of saddle points rather than local minima. The authors argue that saddle points, which are surrounded by high error plateaus, significantly slow down learning and can mislead optimization algorithms into thinking they have reached a local minimum. They propose a new approach called the saddle-free Newton method, which leverages curvature information to rapidly escape saddle points, unlike traditional methods that can get trapped. The method is applied to deep and recurrent neural network training, showing superior optimization performance compared to existing methods. The paper provides theoretical arguments from statistical physics, random matrix theory, and neural network theory, along with empirical evidence, to support its claims.The paper addresses the challenge of minimizing non-convex error functions in high-dimensional spaces, a common issue in various scientific and engineering fields. Traditional methods like gradient descent and quasi-Newton algorithms often struggle due to the proliferation of saddle points rather than local minima. The authors argue that saddle points, which are surrounded by high error plateaus, significantly slow down learning and can mislead optimization algorithms into thinking they have reached a local minimum. They propose a new approach called the saddle-free Newton method, which leverages curvature information to rapidly escape saddle points, unlike traditional methods that can get trapped. The method is applied to deep and recurrent neural network training, showing superior optimization performance compared to existing methods. The paper provides theoretical arguments from statistical physics, random matrix theory, and neural network theory, along with empirical evidence, to support its claims.

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

10 Jun 2014 | Yann N. Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio