Bias in error estimation when using cross-validation for model selection

Bias in error estimation when using cross-validation for model selection

23 February 2006 | Sudhir Varma*† and Richard Simon†
This article discusses the bias in error estimation when using cross-validation (CV) for model selection. The study evaluates the validity of using the CV error estimate of an optimized classifier as an estimate of the true error on independent data. Two classifiers, Shrunken Centroids and Support Vector Machines (SVM), were tested using "null" and "non-null" datasets. The "null" datasets had no difference in gene expression between classes, while "non-null" datasets had differential expression. The study found that the CV error estimate for the optimized classifier was significantly biased and underestimated the true error on independent data. For the Shrunken Centroid classifier, the CV error estimate was less than 30% on 18.5% of simulated training datasets. For the SVM classifier, the estimated error rate was less than 30% on 38% of "null" datasets. The performance of the optimized classifiers on the independent test set was no better than chance. A nested CV procedure, which uses an inner CV loop for parameter tuning and an outer CV loop for error estimation, significantly reduces this bias. The nested CV error estimate was very close to the true error obtained on the independent test set for both classifiers. The conclusion is that using CV to compute an error estimate for a classifier that has been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating the true error of a classifier requires that all steps of the algorithm, including parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.This article discusses the bias in error estimation when using cross-validation (CV) for model selection. The study evaluates the validity of using the CV error estimate of an optimized classifier as an estimate of the true error on independent data. Two classifiers, Shrunken Centroids and Support Vector Machines (SVM), were tested using "null" and "non-null" datasets. The "null" datasets had no difference in gene expression between classes, while "non-null" datasets had differential expression. The study found that the CV error estimate for the optimized classifier was significantly biased and underestimated the true error on independent data. For the Shrunken Centroid classifier, the CV error estimate was less than 30% on 18.5% of simulated training datasets. For the SVM classifier, the estimated error rate was less than 30% on 38% of "null" datasets. The performance of the optimized classifiers on the independent test set was no better than chance. A nested CV procedure, which uses an inner CV loop for parameter tuning and an outer CV loop for error estimation, significantly reduces this bias. The nested CV error estimate was very close to the true error obtained on the independent test set for both classifiers. The conclusion is that using CV to compute an error estimate for a classifier that has been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating the true error of a classifier requires that all steps of the algorithm, including parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.
Reach us at info@study.space