[slides and audio] Mixed Precision Training

This paper introduces a methodology for training deep neural networks using half-precision floating-point numbers (FP16) without compromising model accuracy or requiring hyperparameter adjustments. The authors propose three techniques to prevent information loss: maintaining a single-precision copy of weights, loss-scaling to preserve gradient values, and accumulating FP16 products into single-precision outputs. These techniques are demonstrated to work across various tasks and large-scale model architectures, including convolutional and recurrent networks, trained on large datasets. The results show that mixed precision training reduces memory requirements by half and speeds up arithmetic operations, while maintaining or improving model accuracy compared to single-precision (FP32) training. The paper also discusses related work and implementation details, and concludes with a discussion on future directions, including extending the technique to generative models and deep reinforcement learning applications.This paper introduces a methodology for training deep neural networks using half-precision floating-point numbers (FP16) without compromising model accuracy or requiring hyperparameter adjustments. The authors propose three techniques to prevent information loss: maintaining a single-precision copy of weights, loss-scaling to preserve gradient values, and accumulating FP16 products into single-precision outputs. These techniques are demonstrated to work across various tasks and large-scale model architectures, including convolutional and recurrent networks, trained on large datasets. The results show that mixed precision training reduces memory requirements by half and speeds up arithmetic operations, while maintaining or improving model accuracy compared to single-precision (FP32) training. The paper also discusses related work and implementation details, and concludes with a discussion on future directions, including extending the technique to generative models and deep reinforcement learning applications.

Mixed Precision Training

15 Feb 2018 | Sharan Narang, Gregory Diamos, Erich Elsen, Paulius Micikevicius, Jonah Alben, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, Hao Wu