Understanding Training Very Deep Networks

The paper introduces a new architecture called highway networks, designed to overcome the challenges of training very deep neural networks. Inspired by Long Short-Term Memory (LSTM) recurrent networks, highway networks use adaptive gating units to regulate information flow across layers, allowing for unimpeded information flow along "information highways." This architecture enables the training of extremely deep networks using simple gradient descent, addressing the issue of vanishing gradients and improving the efficiency and performance of deep learning models. The authors demonstrate that highway networks can be effectively trained with up to 100 layers, outperforming plain networks in terms of training speed and generalization. Experimental results on datasets like MNIST and CIFAR-10 show that highway networks achieve competitive or superior performance compared to state-of-the-art methods, even with fewer parameters. The paper also provides insights into the behavior of highway networks, such as the dynamic routing of information and the importance of individual layers, highlighting the potential of highway networks for deep learning applications.The paper introduces a new architecture called highway networks, designed to overcome the challenges of training very deep neural networks. Inspired by Long Short-Term Memory (LSTM) recurrent networks, highway networks use adaptive gating units to regulate information flow across layers, allowing for unimpeded information flow along "information highways." This architecture enables the training of extremely deep networks using simple gradient descent, addressing the issue of vanishing gradients and improving the efficiency and performance of deep learning models. The authors demonstrate that highway networks can be effectively trained with up to 100 layers, outperforming plain networks in terms of training speed and generalization. Experimental results on datasets like MNIST and CIFAR-10 show that highway networks achieve competitive or superior performance compared to state-of-the-art methods, even with fewer parameters. The paper also provides insights into the behavior of highway networks, such as the dynamic routing of information and the importance of individual layers, highlighting the potential of highway networks for deep learning applications.

Training Very Deep Networks

23 Nov 2015 | Rupesh Kumar Srivastava, Klaus Greff, Jürgen Schmidhuber