Understanding Batch Normalization%3A Accelerating Deep Network Training by Reducing Internal Covariate Shift

Batch Normalization is a technique introduced by Sergey Ioffe and Christian Szegedy to accelerate the training of deep neural networks by reducing internal covariate shift. Internal covariate shift refers to the change in the distribution of layer inputs during training due to changes in the parameters of previous layers, which can slow down convergence and make training more challenging, especially with saturating nonlinearities. The method normalizes the inputs of each layer using the mini-batch statistics, ensuring that the mean and variance of the inputs remain fixed throughout training. This normalization helps in maintaining a stable distribution of activations, allowing for higher learning rates and reducing the need for careful initialization. Batch Normalization also acts as a regularizer, reducing the need for Dropout, and enables the use of saturating nonlinearities without getting stuck in saturated regimes. Experiments on the MNIST and ImageNet datasets demonstrate that Batch Normalization significantly speeds up training and improves accuracy, achieving state-of-the-art results on ImageNet with an ensemble of batch-normalized networks.Batch Normalization is a technique introduced by Sergey Ioffe and Christian Szegedy to accelerate the training of deep neural networks by reducing internal covariate shift. Internal covariate shift refers to the change in the distribution of layer inputs during training due to changes in the parameters of previous layers, which can slow down convergence and make training more challenging, especially with saturating nonlinearities. The method normalizes the inputs of each layer using the mini-batch statistics, ensuring that the mean and variance of the inputs remain fixed throughout training. This normalization helps in maintaining a stable distribution of activations, allowing for higher learning rates and reducing the need for careful initialization. Batch Normalization also acts as a regularizer, reducing the need for Dropout, and enables the use of saturating nonlinearities without getting stuck in saturated regimes. Experiments on the MNIST and ImageNet datasets demonstrate that Batch Normalization significantly speeds up training and improves accuracy, achieving state-of-the-art results on ImageNet with an ensemble of batch-normalized networks.

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

2 Mar 2015 | Sergey Ioffe, Christian Szegedy