[slides] Machine learning algorithm validation with a limited sample size

The paper investigates the impact of small sample sizes on the performance estimates of machine learning (ML) algorithms, particularly in the context of neuroimaging studies. The authors review studies that use ML to predict autistic from non-autistic individuals, finding that small sample sizes are associated with higher reported classification accuracy. They simulate various validation methods to understand the sources of bias in performance estimates. Key findings include: 1. **K-Fold Cross-Validation (CV)**: This method produces strongly biased performance estimates, even with a sample size of 1000. 2. **Nested CV and Train/Test Split**: These methods produce robust and unbiased performance estimates regardless of sample size. 3. **Feature Selection**: Performing feature selection on pooled training and testing data significantly contributes to bias, more so than parameter tuning. 4. **Data Dimensionality, Hyper-parameters, and CV Folds**: These factors influence overfitting, with higher feature-to-sample ratios and more parameters leading to greater bias. 5. **Discriminable Data**: Models trained on discriminable data achieve higher performance with larger sample sizes, and the bias produced by less robust validation is lower. The study emphasizes the importance of using robust validation methods to avoid overoptimistic performance estimates, especially in small datasets. It also highlights the need for reporting confidence intervals to account for variability in performance estimates.The paper investigates the impact of small sample sizes on the performance estimates of machine learning (ML) algorithms, particularly in the context of neuroimaging studies. The authors review studies that use ML to predict autistic from non-autistic individuals, finding that small sample sizes are associated with higher reported classification accuracy. They simulate various validation methods to understand the sources of bias in performance estimates. Key findings include: 1. **K-Fold Cross-Validation (CV)**: This method produces strongly biased performance estimates, even with a sample size of 1000. 2. **Nested CV and Train/Test Split**: These methods produce robust and unbiased performance estimates regardless of sample size. 3. **Feature Selection**: Performing feature selection on pooled training and testing data significantly contributes to bias, more so than parameter tuning. 4. **Data Dimensionality, Hyper-parameters, and CV Folds**: These factors influence overfitting, with higher feature-to-sample ratios and more parameters leading to greater bias. 5. **Discriminable Data**: Models trained on discriminable data achieve higher performance with larger sample sizes, and the bias produced by less robust validation is lower. The study emphasizes the importance of using robust validation methods to avoid overoptimistic performance estimates, especially in small datasets. It also highlights the need for reporting confidence intervals to account for variability in performance estimates.

Machine learning algorithm validation with a limited sample size

November 7, 2019 | Andrius Vabalas, Emma Gowen, Ellen Poliakoff, Alexander J. Casson