Identity Mappings in Deep Residual Networks

Identity Mappings in Deep Residual Networks

25 Jul 2016 | Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun
This paper investigates the role of identity mappings in deep residual networks (ResNets) and their impact on training and generalization. The authors analyze the propagation formulations behind residual blocks, showing that identity mappings as skip connections and after-addition activation allow direct signal propagation between any two blocks in both forward and backward passes. This leads to easier training and improved generalization. They propose a new residual unit that uses pre-activation (ReLU and batch normalization before the weight layers) instead of post-activation, which simplifies the information flow and improves performance. Experiments show that the proposed ResNet-1001 achieves a 4.62% error on CIFAR-10 and improved results on CIFAR-100. A 200-layer ResNet also achieves better performance on ImageNet. The new residual unit is shown to be more effective than traditional ones, as it allows for smoother gradient propagation and reduces optimization difficulties. The paper also discusses the impact of different types of skip connections, such as scaling, gating, and 1×1 convolutions, showing that identity shortcuts are most effective for information propagation. The analysis reveals that the use of identity mappings and pre-activation significantly improves the training process and generalization ability of deep residual networks. The results demonstrate that increasing network depth can lead to better performance, and that the proposed pre-activation residual units are more effective than traditional ones. The paper concludes that identity mappings and pre-activation are essential for effective deep residual networks.This paper investigates the role of identity mappings in deep residual networks (ResNets) and their impact on training and generalization. The authors analyze the propagation formulations behind residual blocks, showing that identity mappings as skip connections and after-addition activation allow direct signal propagation between any two blocks in both forward and backward passes. This leads to easier training and improved generalization. They propose a new residual unit that uses pre-activation (ReLU and batch normalization before the weight layers) instead of post-activation, which simplifies the information flow and improves performance. Experiments show that the proposed ResNet-1001 achieves a 4.62% error on CIFAR-10 and improved results on CIFAR-100. A 200-layer ResNet also achieves better performance on ImageNet. The new residual unit is shown to be more effective than traditional ones, as it allows for smoother gradient propagation and reduces optimization difficulties. The paper also discusses the impact of different types of skip connections, such as scaling, gating, and 1×1 convolutions, showing that identity shortcuts are most effective for information propagation. The analysis reveals that the use of identity mappings and pre-activation significantly improves the training process and generalization ability of deep residual networks. The results demonstrate that increasing network depth can lead to better performance, and that the proposed pre-activation residual units are more effective than traditional ones. The paper concludes that identity mappings and pre-activation are essential for effective deep residual networks.
Reach us at info@study.space