06 January 2006 | Ramón Díaz-Uriarte* and Sara Alvarez de Andrés
This article presents a method for gene selection and classification using random forest for microarray data. Random forest is a classification algorithm that performs well with microarray data, even when most predictive variables are noise, and can be used for both two-class and multi-class problems. It also provides measures of variable importance, which are useful for gene selection. The authors propose a new method of gene selection based on random forest, which yields very small sets of genes while preserving predictive accuracy. They compare the performance of random forest with other classification methods, including DLDA, KNN, and SVM, using simulated and real microarray data sets. The results show that random forest has comparable performance to these methods and that the new gene selection procedure yields very small sets of genes. The authors also evaluate the stability of the selected genes and find that the results are somewhat stable, although not entirely unique. The method is applicable to both two-class and multi-class problems and does not require pre-selection of genes or fine-tuning of parameters. The authors conclude that random forest and gene selection using random forest should become part of the standard tool-box for microarray data analysis.This article presents a method for gene selection and classification using random forest for microarray data. Random forest is a classification algorithm that performs well with microarray data, even when most predictive variables are noise, and can be used for both two-class and multi-class problems. It also provides measures of variable importance, which are useful for gene selection. The authors propose a new method of gene selection based on random forest, which yields very small sets of genes while preserving predictive accuracy. They compare the performance of random forest with other classification methods, including DLDA, KNN, and SVM, using simulated and real microarray data sets. The results show that random forest has comparable performance to these methods and that the new gene selection procedure yields very small sets of genes. The authors also evaluate the stability of the selected genes and find that the results are somewhat stable, although not entirely unique. The method is applicable to both two-class and multi-class problems and does not require pre-selection of genes or fine-tuning of parameters. The authors conclude that random forest and gene selection using random forest should become part of the standard tool-box for microarray data analysis.