This paper introduces the use of Rectified Linear Units (ReLU) as the classification function in a deep neural network (DNN), instead of the conventional Softmax function. ReLU is typically used as an activation function in hidden layers, but this study proposes using it as the classification function at the last layer. The approach involves taking the activation of the penultimate layer, multiplying it by weight parameters to get raw scores, then thresholding these scores at zero using the ReLU function. Class predictions are made using the arg max function.
The study compares the performance of DNNs with ReLU classification (DL-ReLU) and Softmax classification (DL-Softmax) on three datasets: MNIST, Fashion-MNIST, and Wisconsin Diagnostic Breast Cancer (WDBC). The models were implemented using Keras with TensorFlow backend, and the Adam optimization algorithm was used for training. Data preprocessing included normalization and dimensionality reduction via PCA.
Results showed that DL-ReLU models performed on par with DL-Softmax models in terms of accuracy, F1-score, and other classification metrics. However, DL-ReLU models converged more slowly, especially in CNN-based models. Despite this, they achieved comparable or slightly better performance in some cases. The study also noted that the "dying ReLU" problem, where neurons become inactive and do not contribute to learning, may affect performance, but DL-ReLU models still showed competitive results compared to conventional Softmax-based models. Future work could involve further investigation into the performance of DL-ReLU models through gradient analysis and comparison with other ReLU variants.This paper introduces the use of Rectified Linear Units (ReLU) as the classification function in a deep neural network (DNN), instead of the conventional Softmax function. ReLU is typically used as an activation function in hidden layers, but this study proposes using it as the classification function at the last layer. The approach involves taking the activation of the penultimate layer, multiplying it by weight parameters to get raw scores, then thresholding these scores at zero using the ReLU function. Class predictions are made using the arg max function.
The study compares the performance of DNNs with ReLU classification (DL-ReLU) and Softmax classification (DL-Softmax) on three datasets: MNIST, Fashion-MNIST, and Wisconsin Diagnostic Breast Cancer (WDBC). The models were implemented using Keras with TensorFlow backend, and the Adam optimization algorithm was used for training. Data preprocessing included normalization and dimensionality reduction via PCA.
Results showed that DL-ReLU models performed on par with DL-Softmax models in terms of accuracy, F1-score, and other classification metrics. However, DL-ReLU models converged more slowly, especially in CNN-based models. Despite this, they achieved comparable or slightly better performance in some cases. The study also noted that the "dying ReLU" problem, where neurons become inactive and do not contribute to learning, may affect performance, but DL-ReLU models still showed competitive results compared to conventional Softmax-based models. Future work could involve further investigation into the performance of DL-ReLU models through gradient analysis and comparison with other ReLU variants.