Training Deep Nets with Sublinear Memory Cost

Training Deep Nets with Sublinear Memory Cost

22 Apr 2016 | Tianqi Chen, Bing Xu, Chiyuan Zhang, Carlos Guestrin
This paper proposes a systematic approach to reduce the memory consumption of deep neural network training. The key idea is to design an algorithm that uses only O(√n) memory to train a network with n layers, with only an extra forward pass per mini-batch. The method focuses on reducing the memory needed to store intermediate feature maps and gradients during training. By analyzing the computation graph, the algorithm enables automatic in-place operations and memory sharing optimizations. It shows that it is possible to trade computation for memory, resulting in a more memory-efficient training algorithm with only a small extra computational cost. In the extreme case, the memory consumption can be reduced to O(log n) with O(n log n) extra cost for forward computation. The paper demonstrates that this approach allows deeper and more complex models to be explored, and helps advance deep learning research. Experiments show that the memory cost of a 1,000-layer deep residual network can be reduced from 48G to 7G on ImageNet problems. Similarly, significant memory cost reduction is observed in training complex recurrent neural networks on very long sequences. The paper also discusses related works, including the idea of computational graph and liveness analysis, and the gradient checkpointing technique. It presents a general methodology that works for general deep neural networks, including both convolutional and recurrent neural networks. The proposed algorithm enables training of deeper convolutional and recurrent neural networks, and provides guidelines for deep learning frameworks to incorporate the memory optimization techniques. The paper introduces a general gradient graph construction algorithm that uses the same idea as the memory optimization algorithm. It also discusses how to drop the results of low cost operations and keep the results of time-consuming operations. The paper presents an O(√n) memory cost algorithm, which reduces the memory cost to be sub-linear. The algorithm requires an additional forward pass during training but reduces the memory cost significantly. The paper also discusses the impact of the proposed algorithm on training speed and provides guidelines for deep learning frameworks. The experiments show that the proposed algorithm can significantly reduce the memory cost of deep network training, enabling the training of deeper and more complex models. The paper concludes that the proposed approach can be used to train deep neural networks with significantly reduced memory consumption.This paper proposes a systematic approach to reduce the memory consumption of deep neural network training. The key idea is to design an algorithm that uses only O(√n) memory to train a network with n layers, with only an extra forward pass per mini-batch. The method focuses on reducing the memory needed to store intermediate feature maps and gradients during training. By analyzing the computation graph, the algorithm enables automatic in-place operations and memory sharing optimizations. It shows that it is possible to trade computation for memory, resulting in a more memory-efficient training algorithm with only a small extra computational cost. In the extreme case, the memory consumption can be reduced to O(log n) with O(n log n) extra cost for forward computation. The paper demonstrates that this approach allows deeper and more complex models to be explored, and helps advance deep learning research. Experiments show that the memory cost of a 1,000-layer deep residual network can be reduced from 48G to 7G on ImageNet problems. Similarly, significant memory cost reduction is observed in training complex recurrent neural networks on very long sequences. The paper also discusses related works, including the idea of computational graph and liveness analysis, and the gradient checkpointing technique. It presents a general methodology that works for general deep neural networks, including both convolutional and recurrent neural networks. The proposed algorithm enables training of deeper convolutional and recurrent neural networks, and provides guidelines for deep learning frameworks to incorporate the memory optimization techniques. The paper introduces a general gradient graph construction algorithm that uses the same idea as the memory optimization algorithm. It also discusses how to drop the results of low cost operations and keep the results of time-consuming operations. The paper presents an O(√n) memory cost algorithm, which reduces the memory cost to be sub-linear. The algorithm requires an additional forward pass during training but reduces the memory cost significantly. The paper also discusses the impact of the proposed algorithm on training speed and provides guidelines for deep learning frameworks. The experiments show that the proposed algorithm can significantly reduce the memory cost of deep network training, enabling the training of deeper and more complex models. The paper concludes that the proposed approach can be used to train deep neural networks with significantly reduced memory consumption.
Reach us at info@study.space
Understanding Training Deep Nets with Sublinear Memory Cost