6 Feb 2015 | Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
This paper explores the use of rectified activation units (rectifiers) in neural networks for image classification, focusing on two main aspects: proposing a Parametric Rectified Linear Unit (PReLU) and developing a robust initialization method for training deep rectifier networks. PReLU generalizes the traditional ReLU by allowing the slope of the negative part to be learned, improving model fitting with minimal computational cost and overfitting risk. The initialization method explicitly models the nonlinearity of rectifiers, enabling the training of extremely deep rectifier models directly from scratch. The authors achieve a top-5 test error of 4.94% on the ImageNet 2012 dataset, surpassing human-level performance (5.1%) for the first time. This result is achieved through the use of PReLU networks, which show a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). The paper also discusses the architecture designs and implementation details, including the comparison between PReLU and ReLU, and the effectiveness of the proposed initialization method in training very deep models.This paper explores the use of rectified activation units (rectifiers) in neural networks for image classification, focusing on two main aspects: proposing a Parametric Rectified Linear Unit (PReLU) and developing a robust initialization method for training deep rectifier networks. PReLU generalizes the traditional ReLU by allowing the slope of the negative part to be learned, improving model fitting with minimal computational cost and overfitting risk. The initialization method explicitly models the nonlinearity of rectifiers, enabling the training of extremely deep rectifier models directly from scratch. The authors achieve a top-5 test error of 4.94% on the ImageNet 2012 dataset, surpassing human-level performance (5.1%) for the first time. This result is achieved through the use of PReLU networks, which show a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). The paper also discusses the architecture designs and implementation details, including the comparison between PReLU and ReLU, and the effectiveness of the proposed initialization method in training very deep models.