May 14, 2002 | Christophe Ambroise and Geoffrey J. McLachlan
The article by Ambroise and McLachlan addresses the issue of selection bias in gene extraction from microarray gene-expression data, particularly in the context of cancer diagnosis and treatment. They highlight that while it has been shown that prediction rules can be constructed from a small number of genes with low prediction error, these results often do not account for selection bias. This bias arises because the genes used in the prediction rule are selected from a subset of tissue samples that were also used for testing. The authors propose two methods to correct for this bias: 10-fold cross-validation and the .632+ bootstrap error estimate. They demonstrate that when these corrections are applied, the cross-validated error rate no longer remains zero for a subset of a few genes, as previously reported. The study uses two published datasets (colon and leukemia) to illustrate the selection bias and the effectiveness of the proposed corrections. The results show that the true prediction error is significantly higher than what is estimated without correcting for selection bias, emphasizing the importance of these corrections in obtaining reliable prediction rules.The article by Ambroise and McLachlan addresses the issue of selection bias in gene extraction from microarray gene-expression data, particularly in the context of cancer diagnosis and treatment. They highlight that while it has been shown that prediction rules can be constructed from a small number of genes with low prediction error, these results often do not account for selection bias. This bias arises because the genes used in the prediction rule are selected from a subset of tissue samples that were also used for testing. The authors propose two methods to correct for this bias: 10-fold cross-validation and the .632+ bootstrap error estimate. They demonstrate that when these corrections are applied, the cross-validated error rate no longer remains zero for a subset of a few genes, as previously reported. The study uses two published datasets (colon and leukemia) to illustrate the selection bias and the effectiveness of the proposed corrections. The results show that the true prediction error is significantly higher than what is estimated without correcting for selection bias, emphasizing the importance of these corrections in obtaining reliable prediction rules.