[slides and audio] Deep Learning without Poor Local Minima

This paper addresses open problems in deep learning, particularly focusing on the optimization landscape of deep neural networks. The author proves a conjecture from 1989 and partially solves an open problem announced at the Conference on Learning Theory (COLT) 2015. For both deep linear and nonlinear neural networks, the paper demonstrates that the loss function is non-convex and non-concave, every local minimum is a global minimum, and every critical point that is not a global minimum is a saddle point. Additionally, the properties of saddle points differ between shallow (three layers) and deeper networks. The results show that while deep learning is more challenging than classical machine learning due to non-convexity, the absence of poor local minima and the specific properties of saddle points make it tractable. The paper also discusses the implications of these findings for the practical training of deep models, suggesting that they can be addressed with greedy optimization methods.This paper addresses open problems in deep learning, particularly focusing on the optimization landscape of deep neural networks. The author proves a conjecture from 1989 and partially solves an open problem announced at the Conference on Learning Theory (COLT) 2015. For both deep linear and nonlinear neural networks, the paper demonstrates that the loss function is non-convex and non-concave, every local minimum is a global minimum, and every critical point that is not a global minimum is a saddle point. Additionally, the properties of saddle points differ between shallow (three layers) and deeper networks. The results show that while deep learning is more challenging than classical machine learning due to non-convexity, the absence of poor local minima and the specific properties of saddle points make it tractable. The paper also discusses the implications of these findings for the practical training of deep models, suggesting that they can be addressed with greedy optimization methods.

Deep Learning without Poor Local Minima

23 May 2016 | Kenji Kawaguchi