Understanding Feature Selection via Concave Minimization and Support Vector Machines

The paper compares two feature selection approaches for finding a separating plane that discriminates between two point sets in an \( n \)-dimensional feature space, aiming to use as few features as possible. The first approach, called Feature Selection via Concave Minimization (FSV), minimizes a weighted sum of distances of misclassified points to two parallel bounding planes, while also minimizing the number of dimensions used. The second approach, Support Vector Machines (SVM), minimizes the same weighted sum of distances but also maximizes the distance between the bounding planes. Computational results show that FSV indirectly suppresses features when an appropriate norm is used, and classifiers trained by FSV select fewer features than those trained by SVM. Numerical tests on six public datasets show comparable 10-fold cross-validation correctness for both approaches, but FSV selects fewer features. The SVM 1-norm approach performs slightly better on three datasets, while FSV performs slightly better on the other three. The FSV approach also reduces the number of features used by up to 30.5% compared to SVM 1-norm, maintaining similar generalization performance. The quadratic SVM approach took significantly more time than the linear programming-based SVMs. Future work includes further analysis of the benefits of different norms and characterizing data sets that benefit from specific norms.The paper compares two feature selection approaches for finding a separating plane that discriminates between two point sets in an \( n \)-dimensional feature space, aiming to use as few features as possible. The first approach, called Feature Selection via Concave Minimization (FSV), minimizes a weighted sum of distances of misclassified points to two parallel bounding planes, while also minimizing the number of dimensions used. The second approach, Support Vector Machines (SVM), minimizes the same weighted sum of distances but also maximizes the distance between the bounding planes. Computational results show that FSV indirectly suppresses features when an appropriate norm is used, and classifiers trained by FSV select fewer features than those trained by SVM. Numerical tests on six public datasets show comparable 10-fold cross-validation correctness for both approaches, but FSV selects fewer features. The SVM 1-norm approach performs slightly better on three datasets, while FSV performs slightly better on the other three. The FSV approach also reduces the number of features used by up to 30.5% compared to SVM 1-norm, maintaining similar generalization performance. The quadratic SVM approach took significantly more time than the linear programming-based SVMs. Future work includes further analysis of the benefits of different norms and characterizing data sets that benefit from specific norms.

Feature Selection via Concave Minimization and Support Vector Machines

| P. S. Bradley, O. L. Mangasarian