This paper presents a study on using linear support vector machines (SVMs) as an alternative to the softmax function in deep learning models for classification tasks. The authors demonstrate that replacing the softmax layer with a linear SVM can lead to significant improvements in performance on several benchmark datasets, including MNIST, CIFAR-10, and a face expression recognition challenge.
The softmax function is commonly used in deep learning for classification tasks, where it outputs a probability distribution over the classes. In contrast, SVMs aim to maximize the margin between different classes. The paper shows that using an L2-SVM as the top layer in a deep neural network can yield better results than using softmax, particularly due to the superior regularization effects of the SVM loss function.
The authors propose a model where the gradients from the SVM layer are backpropagated to train the lower layers of the network. This allows for end-to-end training of the entire network, with the SVM layer providing a more robust loss function for classification. The model is evaluated on three datasets: MNIST, CIFAR-10, and a face expression recognition challenge. On the face expression recognition task, the model achieved a private test score of 71.2%, which was significantly higher than the second-place team's score.
The paper also compares the performance of models using softmax and L2-SVM on MNIST and CIFAR-10 datasets. The results show that the L2-SVM model outperforms the softmax model, with an error rate of 0.87% on MNIST and a test error rate of around 9.5% on CIFAR-10. The authors conclude that replacing the softmax layer with an SVM can lead to better performance in classification tasks, and further research is needed to explore other multiclass SVM formulations and understand the extent of the gains.This paper presents a study on using linear support vector machines (SVMs) as an alternative to the softmax function in deep learning models for classification tasks. The authors demonstrate that replacing the softmax layer with a linear SVM can lead to significant improvements in performance on several benchmark datasets, including MNIST, CIFAR-10, and a face expression recognition challenge.
The softmax function is commonly used in deep learning for classification tasks, where it outputs a probability distribution over the classes. In contrast, SVMs aim to maximize the margin between different classes. The paper shows that using an L2-SVM as the top layer in a deep neural network can yield better results than using softmax, particularly due to the superior regularization effects of the SVM loss function.
The authors propose a model where the gradients from the SVM layer are backpropagated to train the lower layers of the network. This allows for end-to-end training of the entire network, with the SVM layer providing a more robust loss function for classification. The model is evaluated on three datasets: MNIST, CIFAR-10, and a face expression recognition challenge. On the face expression recognition task, the model achieved a private test score of 71.2%, which was significantly higher than the second-place team's score.
The paper also compares the performance of models using softmax and L2-SVM on MNIST and CIFAR-10 datasets. The results show that the L2-SVM model outperforms the softmax model, with an error rate of 0.87% on MNIST and a test error rate of around 9.5% on CIFAR-10. The authors conclude that replacing the softmax layer with an SVM can lead to better performance in classification tasks, and further research is needed to explore other multiclass SVM formulations and understand the extent of the gains.