Dimensionality Reduction Using Genetic Algorithms

Dimensionality Reduction Using Genetic Algorithms

July 2000 | Michael L. Raymer, William F. Punch, Erik D. Goodman, Leslie A. Kuhn, Anil K. Jain
This paper presents a genetic algorithm (GA) approach for dimensionality reduction in pattern recognition. The method simultaneously performs feature selection, feature extraction, and classifier training. The GA optimizes a vector of feature weights to scale individual features in either a linear or nonlinear fashion. A masking vector is also used to select a subset of features. The technique is combined with the k nearest neighbor (knn) classification rule and compared with classical feature selection and extraction techniques, including sequential floating forward feature selection (SFFS) and linear discriminant analysis. The GA-based feature extractor uses feedback from the classifier to iteratively search for a feature vector that provides optimal classification accuracy. The GA maintains a population of competing feature transformation matrices. Each matrix is evaluated based on its ability to improve classification accuracy. The GA searches for a transformation that minimizes the dimensionality of the transformed patterns while maximizing classification accuracy. The GA approach was tested on several data sets, including medical data and biochemistry data. For the thyroid data, the GA/knn achieved a classification accuracy of 98.48% using only 3 of the 21 available features. For the appendicitis data, the GA/knn achieved a classification accuracy of 90.38% using two of the seven available features. For the protein-bound water molecule data, the GA/knn achieved a classification accuracy of 64.20% using four of the eight available features. The GA approach was found to be more effective than SFFS in terms of classification accuracy and feature subset parsimony. The GA approach combines the benefits of feature selection and extraction into a single method. The original features are transformed to produce a new set of features. The relationship between the input and output features need not be linear. This transformation of features can be used with feedback from the classifier to produce better classification accuracy. Additionally, since each output feature of the GA feature extractor is based only on a single input feature, the relationships between the original features and the transformed features remain explicit, and easy to identify and analyze. The GA approach was also found to be effective in identifying favorable water-binding sites on protein surfaces. The method was able to identify a set of features sufficient to make this classification. The results showed that the GA approach was able to achieve a classification accuracy of 64.20% using four of the eight available features. The method was also able to identify the features that lead to conserved water binding, providing a deeper understanding of the data. This capability was of primary importance in classifying the water conservation data, where the goal was more biochemical than statistical.This paper presents a genetic algorithm (GA) approach for dimensionality reduction in pattern recognition. The method simultaneously performs feature selection, feature extraction, and classifier training. The GA optimizes a vector of feature weights to scale individual features in either a linear or nonlinear fashion. A masking vector is also used to select a subset of features. The technique is combined with the k nearest neighbor (knn) classification rule and compared with classical feature selection and extraction techniques, including sequential floating forward feature selection (SFFS) and linear discriminant analysis. The GA-based feature extractor uses feedback from the classifier to iteratively search for a feature vector that provides optimal classification accuracy. The GA maintains a population of competing feature transformation matrices. Each matrix is evaluated based on its ability to improve classification accuracy. The GA searches for a transformation that minimizes the dimensionality of the transformed patterns while maximizing classification accuracy. The GA approach was tested on several data sets, including medical data and biochemistry data. For the thyroid data, the GA/knn achieved a classification accuracy of 98.48% using only 3 of the 21 available features. For the appendicitis data, the GA/knn achieved a classification accuracy of 90.38% using two of the seven available features. For the protein-bound water molecule data, the GA/knn achieved a classification accuracy of 64.20% using four of the eight available features. The GA approach was found to be more effective than SFFS in terms of classification accuracy and feature subset parsimony. The GA approach combines the benefits of feature selection and extraction into a single method. The original features are transformed to produce a new set of features. The relationship between the input and output features need not be linear. This transformation of features can be used with feedback from the classifier to produce better classification accuracy. Additionally, since each output feature of the GA feature extractor is based only on a single input feature, the relationships between the original features and the transformed features remain explicit, and easy to identify and analyze. The GA approach was also found to be effective in identifying favorable water-binding sites on protein surfaces. The method was able to identify a set of features sufficient to make this classification. The results showed that the GA approach was able to achieve a classification accuracy of 64.20% using four of the eight available features. The method was also able to identify the features that lead to conserved water binding, providing a deeper understanding of the data. This capability was of primary importance in classifying the water conservation data, where the goal was more biochemical than statistical.
Reach us at info@study.space