Understanding Partial Least Squares

This chapter discusses the application of Principal Components Analysis (PCA) and Partial Least Squares (PLS) in handling a large number of variables, which traditional statistical tests like MANOVA and MANCOVA struggle with. The primary objective is to assess the performance of these two methods in a simulated example involving 250 patients' gene expression data and drug efficacy scores. - **Background**: Traditional statistical tests are inadequate for managing a large number of variables. Add-up scores are a simple method to reduce variables but do not account for variable importance, interactions, or unit differences. PCA and PLS address these issues but are rarely used in clinical trials. - **Objective**: To evaluate the performance of PCA and PLS. - **Methods**: A simulated dataset of 250 patients' gene expression data and drug efficacy scores was used. PCA was performed using SPSS, and PLS using R Partial Least Squares. - **Results**: Three novel predictor variables were constructed from 27 variables. PCA identified these variables as highly significant predictors with t-values of 10.2, 21.6, and 6.7 (p<0.000). PLS also included the outcome variables and predicted them with lower significance (t-values of 6.8, 16.2, and 3.5, p<0.000). Traditional multiple linear regression with add-up scores showed further reduction in significance (t-values of 3.4, 11.2, and 2.4, p<0.002). - **Conclusions**: 1. PCA and PLS can handle more variables than standard covariance methods and are more sensitive. 2. These methods account for the relative importance, interactions, and differences in units of variables. 3. They are flexible, allowing manifest variables to be used twice—first as clusters for prediction and second as unclustered outcome variables. 4. PLS is more parsimonious than PCA because it can include outcome variables in the model. Clinical trials often involve a large number of variables, such as gene expressions, repeated measurements, and multi-item personal scores. While MANOVA and MANCOVA can handle thousands of cases, they struggle with more than two or three dependent variables. PCA and PLS provide a more robust approach by identifying underlying factors (components or latent variables) and accounting for variable interactions and differences.This chapter discusses the application of Principal Components Analysis (PCA) and Partial Least Squares (PLS) in handling a large number of variables, which traditional statistical tests like MANOVA and MANCOVA struggle with. The primary objective is to assess the performance of these two methods in a simulated example involving 250 patients' gene expression data and drug efficacy scores. - **Background**: Traditional statistical tests are inadequate for managing a large number of variables. Add-up scores are a simple method to reduce variables but do not account for variable importance, interactions, or unit differences. PCA and PLS address these issues but are rarely used in clinical trials. - **Objective**: To evaluate the performance of PCA and PLS. - **Methods**: A simulated dataset of 250 patients' gene expression data and drug efficacy scores was used. PCA was performed using SPSS, and PLS using R Partial Least Squares. - **Results**: Three novel predictor variables were constructed from 27 variables. PCA identified these variables as highly significant predictors with t-values of 10.2, 21.6, and 6.7 (p<0.000). PLS also included the outcome variables and predicted them with lower significance (t-values of 6.8, 16.2, and 3.5, p<0.000). Traditional multiple linear regression with add-up scores showed further reduction in significance (t-values of 3.4, 11.2, and 2.4, p<0.002). - **Conclusions**: 1. PCA and PLS can handle more variables than standard covariance methods and are more sensitive. 2. These methods account for the relative importance, interactions, and differences in units of variables. 3. They are flexible, allowing manifest variables to be used twice—first as clusters for prediction and second as unclustered outcome variables. 4. PLS is more parsimonious than PCA because it can include outcome variables in the model. Clinical trials often involve a large number of variables, such as gene expressions, repeated measurements, and multi-item personal scores. While MANOVA and MANCOVA can handle thousands of cases, they struggle with more than two or three dependent variables. PCA and PLS provide a more robust approach by identifying underlying factors (components or latent variables) and accounting for variable interactions and differences.

Chapter 16 Partial Least Squares

2013 | T.J. Cleophas and A.H. Zwinderman