23 May 2016 | Sergey Zagoruyko and Nikos Komodakis
This paper introduces Wide Residual Networks (WRNs), a novel architecture that improves upon traditional ResNet designs by increasing network width rather than depth. WRNs achieve superior performance and efficiency compared to their deep and narrow counterparts. For example, a simple 16-layer WRN outperforms deep networks with thousands of layers on CIFAR-10, CIFAR-100, and SVHN datasets. WRNs are also significantly faster to train, with some achieving up to 8 times the speed of deep networks while maintaining comparable accuracy.
The paper explores various aspects of ResNet block design, including the type of convolutions used, the number of convolutional layers per block, and the width of residual blocks. It shows that increasing the width of residual blocks leads to better performance than increasing depth, and that WRNs with 16 layers can match or exceed the performance of deep networks with 1000 layers. Additionally, WRNs are more efficient in terms of computational resources, with some being up to 8 times faster to train than their deep counterparts.
The paper also investigates the use of dropout in residual blocks, finding that it can improve performance and prevent overfitting. Dropout is applied between convolutional layers in WRNs, leading to consistent gains in accuracy and efficiency. The results show that WRNs with dropout achieve state-of-the-art results on CIFAR-10 and SVHN, with the 16-layer WRN achieving a 1.64% error rate on SVHN.
The paper concludes that the main power of residual networks lies in their residual blocks, not in extreme depth. WRNs are several times faster to train than deep networks and can achieve similar or better performance with fewer layers. The findings suggest that increasing network width is more effective than increasing depth for improving performance in residual networks.This paper introduces Wide Residual Networks (WRNs), a novel architecture that improves upon traditional ResNet designs by increasing network width rather than depth. WRNs achieve superior performance and efficiency compared to their deep and narrow counterparts. For example, a simple 16-layer WRN outperforms deep networks with thousands of layers on CIFAR-10, CIFAR-100, and SVHN datasets. WRNs are also significantly faster to train, with some achieving up to 8 times the speed of deep networks while maintaining comparable accuracy.
The paper explores various aspects of ResNet block design, including the type of convolutions used, the number of convolutional layers per block, and the width of residual blocks. It shows that increasing the width of residual blocks leads to better performance than increasing depth, and that WRNs with 16 layers can match or exceed the performance of deep networks with 1000 layers. Additionally, WRNs are more efficient in terms of computational resources, with some being up to 8 times faster to train than their deep counterparts.
The paper also investigates the use of dropout in residual blocks, finding that it can improve performance and prevent overfitting. Dropout is applied between convolutional layers in WRNs, leading to consistent gains in accuracy and efficiency. The results show that WRNs with dropout achieve state-of-the-art results on CIFAR-10 and SVHN, with the 16-layer WRN achieving a 1.64% error rate on SVHN.
The paper concludes that the main power of residual networks lies in their residual blocks, not in extreme depth. WRNs are several times faster to train than deep networks and can achieve similar or better performance with fewer layers. The findings suggest that increasing network width is more effective than increasing depth for improving performance in residual networks.