2017 | Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter
Self-normalizing neural networks (SNNs) are introduced to enable high-level abstract representations in deep learning. Unlike standard feed-forward neural networks (FNNs), SNNs automatically normalize neuron activations to zero mean and unit variance without explicit normalization. This is achieved through the use of scaled exponential linear units (SELUs), which induce self-normalizing properties. Using the Banach fixed-point theorem, it is proven that activations close to zero mean and unit variance will converge to this state through many network layers, even in the presence of noise and perturbations. This property allows SNNs to train deep networks, employ strong regularization, and achieve robust learning. Additionally, for activations not close to unit variance, upper and lower bounds on variance are proven, preventing vanishing and exploding gradients.
SNNs were tested on 121 UCI tasks, drug discovery benchmarks, and astronomy tasks, outperforming standard FNNs and other methods like random forests and support vector machines. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy dataset. The winning SNN architectures are often very deep. Implementations are available at github.com/bioinf-jku/SNNs.
SNNs are robust to perturbations and do not have high variance in their training errors. They push neuron activations to zero mean and unit variance, leading to the same effect as batch normalization, enabling robust learning of many layers. SNNs are based on SELUs, which induce self-normalizing properties like variance stabilization, avoiding exploding and vanishing gradients. The mapping g ensures that mean and variance of activations converge to a fixed point, allowing for stable and attracting fixed points. SNNs keep normalization of activations when propagating through layers, with normalization effects observed across layers and learning steps. The mean and variance remain within predefined intervals, ensuring transitive normalization across layers. SNNs are constructed by adjusting the properties of the function g, with two design choices: activation function and weight initialization. SELUs are proposed as the activation function, and weight initialization is set to ω = 0 and τ = 1 for all units in the higher layer. The mean and variance mapping function g is derived under the assumption that inputs are independent but share the same mean and variance.Self-normalizing neural networks (SNNs) are introduced to enable high-level abstract representations in deep learning. Unlike standard feed-forward neural networks (FNNs), SNNs automatically normalize neuron activations to zero mean and unit variance without explicit normalization. This is achieved through the use of scaled exponential linear units (SELUs), which induce self-normalizing properties. Using the Banach fixed-point theorem, it is proven that activations close to zero mean and unit variance will converge to this state through many network layers, even in the presence of noise and perturbations. This property allows SNNs to train deep networks, employ strong regularization, and achieve robust learning. Additionally, for activations not close to unit variance, upper and lower bounds on variance are proven, preventing vanishing and exploding gradients.
SNNs were tested on 121 UCI tasks, drug discovery benchmarks, and astronomy tasks, outperforming standard FNNs and other methods like random forests and support vector machines. SNNs significantly outperformed all competing FNN methods at 121 UCI tasks, outperformed all competing methods at the Tox21 dataset, and set a new record at an astronomy dataset. The winning SNN architectures are often very deep. Implementations are available at github.com/bioinf-jku/SNNs.
SNNs are robust to perturbations and do not have high variance in their training errors. They push neuron activations to zero mean and unit variance, leading to the same effect as batch normalization, enabling robust learning of many layers. SNNs are based on SELUs, which induce self-normalizing properties like variance stabilization, avoiding exploding and vanishing gradients. The mapping g ensures that mean and variance of activations converge to a fixed point, allowing for stable and attracting fixed points. SNNs keep normalization of activations when propagating through layers, with normalization effects observed across layers and learning steps. The mean and variance remain within predefined intervals, ensuring transitive normalization across layers. SNNs are constructed by adjusting the properties of the function g, with two design choices: activation function and weight initialization. SELUs are proposed as the activation function, and weight initialization is set to ω = 0 and τ = 1 for all units in the higher layer. The mean and variance mapping function g is derived under the assumption that inputs are independent but share the same mean and variance.