Prediction error estimation: a comparison of resampling methods

Prediction error estimation: a comparison of resampling methods

May 19, 2005 | Annette M. Molinaro, Richard Simon, Ruth M. Pfeiffer
The paper "Prediction Error Estimation: A Comparison of Resampling Methods" by Annette M. Molinaro, Richard Simon, and Ruth M. Pfeiffer compares various resampling methods for estimating the prediction error of models in genomic studies, where feature selection is a crucial step. The authors focus on small sample sizes and high-dimensional data, which are common in genomic studies. They evaluate methods such as resubstitution, split-sample, leave-one-out cross-validation (LOOCV), 10-fold cross-validation (CV), and the .632+ bootstrap. The results show that for small samples, the split-sample and 2-fold CV methods perform poorly due to high bias, while LOOCV, 10-fold CV, and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor, and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean squared error. The .632+ bootstrap is biased in small samples with strong signal-to-noise ratios. The impact of feature selection on the performance of these methods is highlighted, and the authors conclude that the choice of resampling method should be carefully considered based on the specific characteristics of the data and the model.The paper "Prediction Error Estimation: A Comparison of Resampling Methods" by Annette M. Molinaro, Richard Simon, and Ruth M. Pfeiffer compares various resampling methods for estimating the prediction error of models in genomic studies, where feature selection is a crucial step. The authors focus on small sample sizes and high-dimensional data, which are common in genomic studies. They evaluate methods such as resubstitution, split-sample, leave-one-out cross-validation (LOOCV), 10-fold cross-validation (CV), and the .632+ bootstrap. The results show that for small samples, the split-sample and 2-fold CV methods perform poorly due to high bias, while LOOCV, 10-fold CV, and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor, and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean squared error. The .632+ bootstrap is biased in small samples with strong signal-to-noise ratios. The impact of feature selection on the performance of these methods is highlighted, and the authors conclude that the choice of resampling method should be carefully considered based on the specific characteristics of the data and the model.
Reach us at info@study.space