SIGNSGD: Compressed Optimisation for Non-Convex Problems

SIGNSGD: Compressed Optimisation for Non-Convex Problems

7 Aug 2018 | Jeremy Bernstein, Yu-Xiang Wang, Kamyar Azizzadenesheli, Anima Anandkumar
SIGNSGD is a gradient compression method for non-convex optimization that transmits only the sign of stochastic gradients, achieving both compression and convergence rates similar to standard SGD. The method is particularly effective when gradients are dense, as it can converge with a theoretical rate comparable to SGD. A momentum version, SIGNUM, matches the accuracy and speed of ADAM on deep neural networks. In a distributed setting, majority vote aggregation of gradient signs enables 1-bit communication compression, achieving the same variance reduction as full-precision distributed SGD. Theoretical analysis shows that SIGNSGD and SIGNUM can outperform SGD in scenarios where gradients are dense, and under certain conditions, they are as effective as SGD when gradients are sparse. Experiments on CIFAR-10 and Imagenet demonstrate that SIGNSGD performs comparably to SGD, suggesting it operates in a regime where both methods are effective. The method's efficiency and convergence properties make it a promising approach for fast communication and convergence in distributed learning.SIGNSGD is a gradient compression method for non-convex optimization that transmits only the sign of stochastic gradients, achieving both compression and convergence rates similar to standard SGD. The method is particularly effective when gradients are dense, as it can converge with a theoretical rate comparable to SGD. A momentum version, SIGNUM, matches the accuracy and speed of ADAM on deep neural networks. In a distributed setting, majority vote aggregation of gradient signs enables 1-bit communication compression, achieving the same variance reduction as full-precision distributed SGD. Theoretical analysis shows that SIGNSGD and SIGNUM can outperform SGD in scenarios where gradients are dense, and under certain conditions, they are as effective as SGD when gradients are sparse. Experiments on CIFAR-10 and Imagenet demonstrate that SIGNSGD performs comparably to SGD, suggesting it operates in a regime where both methods are effective. The method's efficiency and convergence properties make it a promising approach for fast communication and convergence in distributed learning.
Reach us at info@study.space