Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

30 Jan 2013 | Shai Shalev-Shwartz, Tong Zhang
This paper presents a new analysis of Stochastic Dual Coordinate Ascent (SDCA) for regularized loss minimization, showing that it enjoys strong theoretical guarantees comparable or better than Stochastic Gradient Descent (SGD). The analysis justifies the effectiveness of SDCA in practical applications. The paper focuses on the convergence of the duality gap for SDCA, analyzing it for both $L$-Lipschitz loss functions and $(1/\gamma)$-smooth loss functions. For Lipschitz loss functions, the runtime is $\tilde{O}(n + L^2 / (\lambda \epsilon))$, and for $(1/\gamma)$-smooth loss functions, it is $\tilde{O}((n + 1 / (\lambda \gamma)) \log(1 / \epsilon))$. The paper also discusses the practical importance of randomization in SDCA and compares it with SGD, showing that SDCA can be faster in the first few epochs. Additionally, the paper provides a refined analysis for almost smooth loss functions, demonstrating potential advantages of SDCA over SGD asymptotically.This paper presents a new analysis of Stochastic Dual Coordinate Ascent (SDCA) for regularized loss minimization, showing that it enjoys strong theoretical guarantees comparable or better than Stochastic Gradient Descent (SGD). The analysis justifies the effectiveness of SDCA in practical applications. The paper focuses on the convergence of the duality gap for SDCA, analyzing it for both $L$-Lipschitz loss functions and $(1/\gamma)$-smooth loss functions. For Lipschitz loss functions, the runtime is $\tilde{O}(n + L^2 / (\lambda \epsilon))$, and for $(1/\gamma)$-smooth loss functions, it is $\tilde{O}((n + 1 / (\lambda \gamma)) \log(1 / \epsilon))$. The paper also discusses the practical importance of randomization in SDCA and compares it with SGD, showing that SDCA can be faster in the first few epochs. Additionally, the paper provides a refined analysis for almost smooth loss functions, demonstrating potential advantages of SDCA over SGD asymptotically.
Reach us at info@study.space
Understanding Stochastic dual coordinate ascent methods for regularized loss