[slides] Variable selection %E2%80%93 A review and recommendations for the practicing statistician

This paper reviews various variable selection methods and their implications for statistical models, particularly in the context of life sciences. The authors discuss the challenges of selecting relevant independent variables (IVs) from a large set of potential candidates, which is common in empirical research. They highlight the importance of background knowledge and the need for pragmatic recommendations for practicing statisticians. Key points include: - **Statistical Models**: Models are used to connect outcome variables with IVs, providing insights into the relationships between multiple factors. - **Model Types**: Linear, logistic, and Cox models are commonly used, with regression coefficients being interpretable. - **Variable Selection Criteria**: - **Significance Criteria**: Hypothesis tests (Wald, score, likelihood ratio) are used to include or exclude IVs. - **Information Criteria**: AIC and BIC penalize model complexity, with AIC being more flexible and BIC converging to the true model as sample size increases. - ** Penalized Likelihood**: LASSO penalties shrink coefficients, reducing bias but affecting interpretability. - **Change-in-estimate Criterion**: Evaluates the impact of removing an IV on the coefficient of another IV. - **Background Knowledge**: Integrating prior knowledge to guide variable selection. - **Consequences of Variable Selection**: - **Model Stability**: Resampling methods (bootstrap, cross-validation) help assess model stability. - **Shrinkage and Bias-Variance Trade-off**: Shrinkage methods reduce overestimation bias but may introduce bias. - **Recommendations**: - **Generate Initial Working Set**: Start with a defendable set of IVs based on background knowledge. - **Decide on Variable Selection**: Use BE for initial selection, AIC for stopping criteria, and ABE for confounder selection. - **Stability Investigations**: Perform stability tests and sensitivity analyses to assess the impact of variable selection on model properties. The paper emphasizes the importance of careful consideration and validation of variable selection methods to ensure the reliability and interpretability of statistical models.This paper reviews various variable selection methods and their implications for statistical models, particularly in the context of life sciences. The authors discuss the challenges of selecting relevant independent variables (IVs) from a large set of potential candidates, which is common in empirical research. They highlight the importance of background knowledge and the need for pragmatic recommendations for practicing statisticians. Key points include: - **Statistical Models**: Models are used to connect outcome variables with IVs, providing insights into the relationships between multiple factors. - **Model Types**: Linear, logistic, and Cox models are commonly used, with regression coefficients being interpretable. - **Variable Selection Criteria**: - **Significance Criteria**: Hypothesis tests (Wald, score, likelihood ratio) are used to include or exclude IVs. - **Information Criteria**: AIC and BIC penalize model complexity, with AIC being more flexible and BIC converging to the true model as sample size increases. - ** Penalized Likelihood**: LASSO penalties shrink coefficients, reducing bias but affecting interpretability. - **Change-in-estimate Criterion**: Evaluates the impact of removing an IV on the coefficient of another IV. - **Background Knowledge**: Integrating prior knowledge to guide variable selection. - **Consequences of Variable Selection**: - **Model Stability**: Resampling methods (bootstrap, cross-validation) help assess model stability. - **Shrinkage and Bias-Variance Trade-off**: Shrinkage methods reduce overestimation bias but may introduce bias. - **Recommendations**: - **Generate Initial Working Set**: Start with a defendable set of IVs based on background knowledge. - **Decide on Variable Selection**: Use BE for initial selection, AIC for stopping criteria, and ABE for confounder selection. - **Stability Investigations**: Perform stability tests and sensitivity analyses to assess the impact of variable selection on model properties. The paper emphasizes the importance of careful consideration and validation of variable selection methods to ensure the reliability and interpretability of statistical models.

Variable selection – A review and recommendations for the practicing statistician

18 April 2017, Revised: 13 November 2017, Accepted: 17 November 2017 | Georg Heinze, Christine Wallisch, Daniela Dunkler