Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Lojasiewicz Condition

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Lojasiewicz Condition

12 Sep 2020 | Hamed Karimi, Julie Nutini, and Mark Schmidt
This paper explores the Polyak-Lojasiewicz (PL) inequality, a condition introduced in 1963, and its implications for the linear convergence of gradient descent methods. The PL inequality is shown to be weaker than more recent conditions used to prove linear convergence rates without strong convexity. The authors demonstrate that the PL inequality can be used to provide new analyses of various optimization methods, including randomized and greedy coordinate descent, sign-based gradient descent, and stochastic gradient methods. They also propose a generalization of the PL inequality for proximal-gradient methods, which leads to simple proofs of linear convergence for non-smooth optimization problems. The paper applies these results to a wide range of machine learning problems, such as least squares, logistic regression, boosting, and support vector machines, showing that gradient descent can achieve linear convergence rates under the PL inequality even when the objective function is not strongly convex.This paper explores the Polyak-Lojasiewicz (PL) inequality, a condition introduced in 1963, and its implications for the linear convergence of gradient descent methods. The PL inequality is shown to be weaker than more recent conditions used to prove linear convergence rates without strong convexity. The authors demonstrate that the PL inequality can be used to provide new analyses of various optimization methods, including randomized and greedy coordinate descent, sign-based gradient descent, and stochastic gradient methods. They also propose a generalization of the PL inequality for proximal-gradient methods, which leads to simple proofs of linear convergence for non-smooth optimization problems. The paper applies these results to a wide range of machine learning problems, such as least squares, logistic regression, boosting, and support vector machines, showing that gradient descent can achieve linear convergence rates under the PL inequality even when the objective function is not strongly convex.
Reach us at info@study.space