29 Dec 2017 | Wei Wen1, Cong Xu2, Feng Yan3, Chunpeng Wu1, Yandan Wang4, Yiran Chen1, Hai Li1
TernGrad is a novel approach designed to reduce communication costs in distributed deep learning by using ternary gradients. The method quantizes gradients into three levels {-1, 0, 1}, significantly reducing the communication overhead. The authors mathematically prove the convergence of TernGrad under a gradient bound assumption and propose techniques such as layer-wise ternarizing and gradient clipping to improve its performance. Experiments on various models, including AlexNet and GoogLeNet, show that TernGrad can achieve accuracy levels comparable to standard SGD while reducing communication time. The method also demonstrates scalability and efficiency in large-scale distributed training, with significant speed gains observed across different GPU clusters.TernGrad is a novel approach designed to reduce communication costs in distributed deep learning by using ternary gradients. The method quantizes gradients into three levels {-1, 0, 1}, significantly reducing the communication overhead. The authors mathematically prove the convergence of TernGrad under a gradient bound assumption and propose techniques such as layer-wise ternarizing and gradient clipping to improve its performance. Experiments on various models, including AlexNet and GoogLeNet, show that TernGrad can achieve accuracy levels comparable to standard SGD while reducing communication time. The method also demonstrates scalability and efficiency in large-scale distributed training, with significant speed gains observed across different GPU clusters.