21 Jul 2016 | Jimmy Lei Ba, Jamie Ryan Kiros, Geoffrey E. Hinton
Layer normalization is a technique introduced to improve the training speed of various neural network models. Unlike batch normalization, which relies on mini-batch statistics, layer normalization computes normalization statistics from the summed inputs to neurons within a hidden layer on a single training case. This approach eliminates dependencies between training cases and ensures consistent normalization during training and testing. Layer normalization is particularly effective for recurrent neural networks (RNNs), as it allows for straightforward application by computing normalization statistics at each time step. The method stabilizes hidden state dynamics in RNNs and reduces training time compared to previous techniques.
Layer normalization is invariant to re-scaling and re-centering of weights and input data, making it more robust to variations in training data. It also provides implicit learning rate reduction and helps stabilize learning towards convergence. Empirical results show that layer normalization improves both training speed and generalization performance across various tasks, including image-sentence ranking, question-answering, and handwriting sequence generation. In experiments with RNNs, layer normalization converges faster and achieves better results than batch normalization, especially for long sequences and small mini-batches.
The method has been applied to different neural network architectures, including feed-forward networks and convolutional networks, with promising results. While layer normalization shows potential in convolutional networks, further research is needed to optimize its performance in these settings. Overall, layer normalization offers a flexible and effective approach to improving the training efficiency and stability of neural networks.Layer normalization is a technique introduced to improve the training speed of various neural network models. Unlike batch normalization, which relies on mini-batch statistics, layer normalization computes normalization statistics from the summed inputs to neurons within a hidden layer on a single training case. This approach eliminates dependencies between training cases and ensures consistent normalization during training and testing. Layer normalization is particularly effective for recurrent neural networks (RNNs), as it allows for straightforward application by computing normalization statistics at each time step. The method stabilizes hidden state dynamics in RNNs and reduces training time compared to previous techniques.
Layer normalization is invariant to re-scaling and re-centering of weights and input data, making it more robust to variations in training data. It also provides implicit learning rate reduction and helps stabilize learning towards convergence. Empirical results show that layer normalization improves both training speed and generalization performance across various tasks, including image-sentence ranking, question-answering, and handwriting sequence generation. In experiments with RNNs, layer normalization converges faster and achieves better results than batch normalization, especially for long sequences and small mini-batches.
The method has been applied to different neural network architectures, including feed-forward networks and convolutional networks, with promising results. While layer normalization shows potential in convolutional networks, further research is needed to optimize its performance in these settings. Overall, layer normalization offers a flexible and effective approach to improving the training efficiency and stability of neural networks.