Assessment of PLSDA cross validation

Assessment of PLSDA cross validation

24 January 2008 | Johan A. Westerhuis · Huub C. J. Hoefsloot · Suzanne Smit · Daniel J. Vis · Age K. Smilde · Ewoud J. J. van Velzen · John P. M. van Duijnhooven · Ferdi A. van Dorsten
This paper discusses the assessment of Partial Least Squares Discriminant Analysis (PLSDA) cross-validation in metabolomics research. The authors highlight the challenges of classifying groups of individuals based on metabolic profiles due to the low number of samples compared to the large number of variables. They emphasize the need for rigorous validation methods to prevent overfitting and ensure reliable classification results. The paper introduces a strategy based on cross-model validation and permutation testing to validate PLSDA classification models. Permutation tests are used to generate a reference distribution for the null hypothesis that no difference exists between classes, while cross-model validation assesses the variability of model parameters and their effect on prediction accuracy. Key findings include: 1. **Cross-Model Validation**: Proper cross-validation, particularly double cross-validation (2CV), is essential for reliable model validation. Single cross-validation (1CV) and single cross-validation with a leave-one-out approach (FIT) lead to overoptimistic results. 2. **Permutation Testing**: Permutation tests provide a sensible measure for the Q² value and help determine the statistical significance of classification results. They show that the original classification is significantly better than random assignments. 3. **Final Calibration Model**: Instead of a single final model, a group of slightly different models can be used to obtain a range of class membership predictions, providing a confidence measure for class membership assignment. The authors conclude that PLSDA score plots should not be used to infer class differences due to their overoptimistic nature. Instead, class predictions based on multiple models developed during cross-validation are more informative and reliable.This paper discusses the assessment of Partial Least Squares Discriminant Analysis (PLSDA) cross-validation in metabolomics research. The authors highlight the challenges of classifying groups of individuals based on metabolic profiles due to the low number of samples compared to the large number of variables. They emphasize the need for rigorous validation methods to prevent overfitting and ensure reliable classification results. The paper introduces a strategy based on cross-model validation and permutation testing to validate PLSDA classification models. Permutation tests are used to generate a reference distribution for the null hypothesis that no difference exists between classes, while cross-model validation assesses the variability of model parameters and their effect on prediction accuracy. Key findings include: 1. **Cross-Model Validation**: Proper cross-validation, particularly double cross-validation (2CV), is essential for reliable model validation. Single cross-validation (1CV) and single cross-validation with a leave-one-out approach (FIT) lead to overoptimistic results. 2. **Permutation Testing**: Permutation tests provide a sensible measure for the Q² value and help determine the statistical significance of classification results. They show that the original classification is significantly better than random assignments. 3. **Final Calibration Model**: Instead of a single final model, a group of slightly different models can be used to obtain a range of class membership predictions, providing a confidence measure for class membership assignment. The authors conclude that PLSDA score plots should not be used to infer class differences due to their overoptimistic nature. Instead, class predictions based on multiple models developed during cross-validation are more informative and reliable.
Reach us at info@study.space
[slides and audio] UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation