January 30, 2024 | J.M. Gorriz, F. Segovia, J Ramirez*, A. Ortiz, John Suckling
K-fold cross-validation (CV) is a common method for assessing the reliability of machine learning (ML) models. However, it has limitations, particularly in small sample sizes and heterogeneous data. This paper proposes a novel statistical test, K-fold Cross Upper Bounding Validation (CUBV), which uses the Upper Bound of the actual error to estimate the probability of uncertainty from a sample. CUBV is based on concentration inequalities and provides a robust criterion for detecting effects and validating accuracy values while avoiding excess false positives. The method is evaluated on neuroimaging datasets and shows improved performance compared to traditional CV. The paper highlights the challenges of using ML in group comparisons, including the risk of overfitting and false positives. It also discusses the limitations of permutation tests and the importance of considering the distribution of data across folds. The results show that CUBV provides better control over false positives and more accurate estimates of model performance, especially in complex and heterogeneous data scenarios. The method is validated using real MRI data and demonstrates its effectiveness in detecting effects and controlling false positives. The paper concludes that CUBV is a robust and reliable method for statistical inference in ML applications.K-fold cross-validation (CV) is a common method for assessing the reliability of machine learning (ML) models. However, it has limitations, particularly in small sample sizes and heterogeneous data. This paper proposes a novel statistical test, K-fold Cross Upper Bounding Validation (CUBV), which uses the Upper Bound of the actual error to estimate the probability of uncertainty from a sample. CUBV is based on concentration inequalities and provides a robust criterion for detecting effects and validating accuracy values while avoiding excess false positives. The method is evaluated on neuroimaging datasets and shows improved performance compared to traditional CV. The paper highlights the challenges of using ML in group comparisons, including the risk of overfitting and false positives. It also discusses the limitations of permutation tests and the importance of considering the distribution of data across folds. The results show that CUBV provides better control over false positives and more accurate estimates of model performance, especially in complex and heterogeneous data scenarios. The method is validated using real MRI data and demonstrates its effectiveness in detecting effects and controlling false positives. The paper concludes that CUBV is a robust and reliable method for statistical inference in ML applications.