Optimization Methods for Large-Scale Machine Learning

Optimization Methods for Large-Scale Machine Learning

February 12, 2018 | Léon Bottou, Frank E. Curtis, Jorge Nocedal
This paper reviews and discusses the past, present, and future of numerical optimization algorithms in the context of machine learning. Through case studies on text classification and deep neural networks, it explores how optimization problems arise in machine learning and what makes them challenging. A major theme is that large-scale machine learning typically relies on stochastic gradient (SG) methods, while conventional gradient-based techniques often fail. The paper presents a comprehensive theory of SG, discusses its practical behavior, and highlights opportunities for improved algorithms. It also discusses next-generation optimization methods, including noise reduction and second-order derivative approximation techniques. The paper begins with an introduction, followed by case studies on text classification via convex optimization and perceptual tasks via deep neural networks. These case studies illustrate the variety of optimization problems in machine learning, ranging from convex to highly nonlinear and nonconvex problems. The paper then provides an overview of optimization methods, discussing stochastic versus batch methods, the motivation for stochastic methods, and analyses of stochastic gradient methods. The paper discusses noise reduction methods, such as reducing noise at a geometric rate, dynamic sample size methods, and gradient aggregation techniques like SVRG and SAGA. It also covers second-order methods, including Hessian-free Newton methods, stochastic quasi-Newton methods, and Gauss-Newton methods. Other popular methods, such as gradient methods with momentum and coordinate descent, are also discussed. The paper concludes with a summary and perspectives on the future of optimization methods for machine learning.This paper reviews and discusses the past, present, and future of numerical optimization algorithms in the context of machine learning. Through case studies on text classification and deep neural networks, it explores how optimization problems arise in machine learning and what makes them challenging. A major theme is that large-scale machine learning typically relies on stochastic gradient (SG) methods, while conventional gradient-based techniques often fail. The paper presents a comprehensive theory of SG, discusses its practical behavior, and highlights opportunities for improved algorithms. It also discusses next-generation optimization methods, including noise reduction and second-order derivative approximation techniques. The paper begins with an introduction, followed by case studies on text classification via convex optimization and perceptual tasks via deep neural networks. These case studies illustrate the variety of optimization problems in machine learning, ranging from convex to highly nonlinear and nonconvex problems. The paper then provides an overview of optimization methods, discussing stochastic versus batch methods, the motivation for stochastic methods, and analyses of stochastic gradient methods. The paper discusses noise reduction methods, such as reducing noise at a geometric rate, dynamic sample size methods, and gradient aggregation techniques like SVRG and SAGA. It also covers second-order methods, including Hessian-free Newton methods, stochastic quasi-Newton methods, and Gauss-Newton methods. Other popular methods, such as gradient methods with momentum and coordinate descent, are also discussed. The paper concludes with a summary and perspectives on the future of optimization methods for machine learning.
Reach us at info@study.space