[slides and audio] Empirical Evaluation of Rectified Activations in Convolutional Network

This paper evaluates the performance of different rectified activation functions in convolutional neural networks (CNNs), including standard ReLU, leaky ReLU (Leaky ReLU), parametric ReLU (PReLU), and randomized ReLU (RReLU). The study is conducted on standard image classification tasks, specifically CIFAR-10 and CIFAR-100 datasets, as well as the National Data Science Bowl (NDSB) competition dataset. The findings suggest that incorporating a non-zero slope for the negative part in rectified activation units consistently improves performance, challenging the common belief that sparsity is the key to good performance in ReLU. Additionally, the paper finds that deterministic negative slopes or learning them can lead to overfitting on small datasets, while their randomized counterparts are more effective. RReLU, in particular, shows significant advantages in reducing overfitting, achieving 75.68% accuracy on the CIFAR-100 test set without ensemble or multiple test runs. The study concludes that while ReLU remains popular, there is room for improvement with modified leaky ReLU variants, and further research is needed to understand their performance on larger datasets.This paper evaluates the performance of different rectified activation functions in convolutional neural networks (CNNs), including standard ReLU, leaky ReLU (Leaky ReLU), parametric ReLU (PReLU), and randomized ReLU (RReLU). The study is conducted on standard image classification tasks, specifically CIFAR-10 and CIFAR-100 datasets, as well as the National Data Science Bowl (NDSB) competition dataset. The findings suggest that incorporating a non-zero slope for the negative part in rectified activation units consistently improves performance, challenging the common belief that sparsity is the key to good performance in ReLU. Additionally, the paper finds that deterministic negative slopes or learning them can lead to overfitting on small datasets, while their randomized counterparts are more effective. RReLU, in particular, shows significant advantages in reducing overfitting, achieving 75.68% accuracy on the CIFAR-100 test set without ensemble or multiple test runs. The study concludes that while ReLU remains popular, there is room for improvement with modified leaky ReLU variants, and further research is needed to understand their performance on larger datasets.

Empirical Evaluation of Rectified Activations in Convolution Network

27 Nov 2015 | Bing Xu, Naiyan Wang, Tianqi Chen, Mu Li