Training Deep Nets with Sublinear Memory Cost

Training Deep Nets with Sublinear Memory Cost

22 Apr 2016 | Tianqi Chen, Bing Xu, Chiyuan Zhang, Carlos Guestrin
The paper "Training Deep Nets with Sublinear Memory Cost" by Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin proposes a systematic approach to reduce the memory consumption during the training of deep neural networks. The authors design an algorithm that achieves an $O(\sqrt{n})$ memory cost for training an $n$-layer network, with only an extra forward pass per mini-batch in computational cost. This approach allows for the exploration of deeper and more complex models, which are often memory-limited by current GPU capabilities. The focus is on reducing the memory used to store intermediate feature maps and gradients, using computation graph analysis for automatic in-place operations and memory sharing optimizations. The paper demonstrates that it is possible to trade computation for memory, achieving a sublinear memory cost of $O(\log n)$ with minimal extra computational cost. Experiments show significant memory savings, such as reducing the memory cost from 48GB to 7GB for a 1,000-layer deep residual network on ImageNet problems. The method also enables training of complex recurrent neural networks on very long sequences. The paper provides guidelines for deep learning frameworks to incorporate these memory optimization techniques and makes the implementation publicly available.The paper "Training Deep Nets with Sublinear Memory Cost" by Tianqi Chen, Bing Xu, Chiyuan Zhang, and Carlos Guestrin proposes a systematic approach to reduce the memory consumption during the training of deep neural networks. The authors design an algorithm that achieves an $O(\sqrt{n})$ memory cost for training an $n$-layer network, with only an extra forward pass per mini-batch in computational cost. This approach allows for the exploration of deeper and more complex models, which are often memory-limited by current GPU capabilities. The focus is on reducing the memory used to store intermediate feature maps and gradients, using computation graph analysis for automatic in-place operations and memory sharing optimizations. The paper demonstrates that it is possible to trade computation for memory, achieving a sublinear memory cost of $O(\log n)$ with minimal extra computational cost. Experiments show significant memory savings, such as reducing the memory cost from 48GB to 7GB for a 1,000-layer deep residual network on ImageNet problems. The method also enables training of complex recurrent neural networks on very long sequences. The paper provides guidelines for deep learning frameworks to incorporate these memory optimization techniques and makes the implementation publicly available.
Reach us at info@study.space
Understanding Training Deep Nets with Sublinear Memory Cost