[slides] Optimization Methods for Large-Scale Machine Learning

This paper provides a comprehensive review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. It discusses how optimization problems arise in machine learning, particularly in text classification and deep neural network training, and highlights the challenges these problems pose. The paper emphasizes that large-scale machine learning is a unique setting where stochastic gradient (SG) methods have traditionally played a central role, while conventional gradient-based nonlinear optimization techniques often struggle. The authors present a detailed theory of the SG algorithm, discuss its practical behavior, and explore opportunities for improving its performance. They also discuss the next generation of optimization methods for large-scale machine learning, including techniques that reduce noise in stochastic directions and methods that utilize second-order derivative approximations. The paper concludes with a summary and perspectives on the future of optimization methods in machine learning.This paper provides a comprehensive review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. It discusses how optimization problems arise in machine learning, particularly in text classification and deep neural network training, and highlights the challenges these problems pose. The paper emphasizes that large-scale machine learning is a unique setting where stochastic gradient (SG) methods have traditionally played a central role, while conventional gradient-based nonlinear optimization techniques often struggle. The authors present a detailed theory of the SG algorithm, discuss its practical behavior, and explore opportunities for improving its performance. They also discuss the next generation of optimization methods for large-scale machine learning, including techniques that reduce noise in stochastic directions and methods that utilize second-order derivative approximations. The paper concludes with a summary and perspectives on the future of optimization methods in machine learning.

Optimization Methods for Large-Scale Machine Learning

February 12, 2018 | Léon Bottou, Frank E. Curtis, Jorge Nocedal