HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent

June 2011 | Feng Niu, Benjamin Recht, Christopher Ré and Stephen J. Wright
HOGWILD! is a lock-free approach to parallelizing Stochastic Gradient Descent (SGD) that allows processors to access shared memory without locking, enabling nearly optimal convergence rates for sparse optimization problems. The algorithm enables processors to update individual components of memory independently, which is feasible when the data access is sparse, as most gradient updates modify only a small part of the decision variable. This approach outperforms locking-based schemes by an order of magnitude in experiments on various machine learning tasks, including sparse SVM, matrix completion, and graph cuts. Theoretical analysis shows that HOGWILD! achieves nearly linear speedup with the number of processors when the problem is sparse, and it provides robust 1/k convergence rates with constant stepsize schemes. The algorithm is implemented on multicore systems, where it achieves high performance due to low latency and high throughput shared memory. Experiments demonstrate that HOGWILD! outperforms round-robin and averaging-based methods in parallel settings, even when gradient computations are slow. The work shows that lock-free parallelism can achieve efficient and scalable data analysis on multicore machines.HOGWILD! is a lock-free approach to parallelizing Stochastic Gradient Descent (SGD) that allows processors to access shared memory without locking, enabling nearly optimal convergence rates for sparse optimization problems. The algorithm enables processors to update individual components of memory independently, which is feasible when the data access is sparse, as most gradient updates modify only a small part of the decision variable. This approach outperforms locking-based schemes by an order of magnitude in experiments on various machine learning tasks, including sparse SVM, matrix completion, and graph cuts. Theoretical analysis shows that HOGWILD! achieves nearly linear speedup with the number of processors when the problem is sparse, and it provides robust 1/k convergence rates with constant stepsize schemes. The algorithm is implemented on multicore systems, where it achieves high performance due to low latency and high throughput shared memory. Experiments demonstrate that HOGWILD! outperforms round-robin and averaging-based methods in parallel settings, even when gradient computations are slow. The work shows that lock-free parallelism can achieve efficient and scalable data analysis on multicore machines.
Reach us at info@study.space