Deep Learning without Poor Local Minima

Deep Learning without Poor Local Minima

23 May 2016 | Kenji Kawaguchi
This paper proves a conjecture from 1989 and addresses an open problem from COLT 2015 regarding the loss surfaces of deep neural networks. For both deep linear and nonlinear networks, the authors show that the loss functions are non-convex and non-concave, every local minimum is a global minimum, and every critical point that is not a global minimum is a saddle point. Additionally, the properties of saddle points differ between shallow and deeper networks. For deep linear networks, these results hold regardless of depth, width, or assumptions. The findings suggest that training deep models is more challenging than classical models due to non-convexity but not too difficult because of the absence of poor local minima and specific saddle point properties. The paper also highlights the gap between theoretical results and practical applications in deep learning.This paper proves a conjecture from 1989 and addresses an open problem from COLT 2015 regarding the loss surfaces of deep neural networks. For both deep linear and nonlinear networks, the authors show that the loss functions are non-convex and non-concave, every local minimum is a global minimum, and every critical point that is not a global minimum is a saddle point. Additionally, the properties of saddle points differ between shallow and deeper networks. For deep linear networks, these results hold regardless of depth, width, or assumptions. The findings suggest that training deep models is more challenging than classical models due to non-convexity but not too difficult because of the absence of poor local minima and specific saddle point properties. The paper also highlights the gap between theoretical results and practical applications in deep learning.
Reach us at info@study.space
[slides and audio] Deep Learning without Poor Local Minima