16 December 2008 | Zoran Bursac*, C Heath Gauss, David Keith Williams and David W Hosmer
This article presents a purposeful selection algorithm for logistic regression, which automates the process of selecting variables for inclusion in a model. The algorithm is compared with three well-known variable selection methods in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. The purpose of the algorithm is to retain significant covariates as well as important confounding variables, resulting in a potentially richer model. The algorithm is implemented as a SAS macro and tested on simulated data and the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data set.
The purposeful selection process begins with a univariate analysis of each variable. Variables with significant univariate tests are selected as candidates for multivariate analysis. In the iterative process of variable selection, covariates are removed from the model if they are not significant and not a confounder. Significance is evaluated at the 0.1 alpha level, and confounding is defined as a change in any remaining parameter estimate greater than 15% or 20% compared to the full model. After this process, any variable not selected for the original multivariate model is added back one at a time, with significant covariates and confounders retained earlier. This step helps identify variables that may not be significant on their own but are important in the presence of other variables.
The algorithm was tested in two simulation studies. In the first, six equally important covariates were considered, with three significant and three not. In the second, six equally important covariates were considered, with two significant, one a confounder, and three not significant. The results showed that the purposeful selection algorithm performed better than the other three methods, especially in cases where the significance level of a confounder was between 0.1 and 0.15. The algorithm was also tested on the WHAS data set, where it retained important confounders and variables that other methods did not.
The study concludes that the purposeful selection algorithm is a useful tool for variable selection in logistic regression, particularly when the analyst is interested in risk factor modeling rather than just prediction. The algorithm is recommended for use with a confounding level of 15% and a non-candidate inclusion level of 0.15 to improve the retention of meaningful confounders. The algorithm is implemented as a SAS macro and is available for use.This article presents a purposeful selection algorithm for logistic regression, which automates the process of selecting variables for inclusion in a model. The algorithm is compared with three well-known variable selection methods in SAS PROC LOGISTIC: FORWARD, BACKWARD, and STEPWISE. The purpose of the algorithm is to retain significant covariates as well as important confounding variables, resulting in a potentially richer model. The algorithm is implemented as a SAS macro and tested on simulated data and the Hosmer and Lemeshow Worchester Heart Attack Study (WHAS) data set.
The purposeful selection process begins with a univariate analysis of each variable. Variables with significant univariate tests are selected as candidates for multivariate analysis. In the iterative process of variable selection, covariates are removed from the model if they are not significant and not a confounder. Significance is evaluated at the 0.1 alpha level, and confounding is defined as a change in any remaining parameter estimate greater than 15% or 20% compared to the full model. After this process, any variable not selected for the original multivariate model is added back one at a time, with significant covariates and confounders retained earlier. This step helps identify variables that may not be significant on their own but are important in the presence of other variables.
The algorithm was tested in two simulation studies. In the first, six equally important covariates were considered, with three significant and three not. In the second, six equally important covariates were considered, with two significant, one a confounder, and three not significant. The results showed that the purposeful selection algorithm performed better than the other three methods, especially in cases where the significance level of a confounder was between 0.1 and 0.15. The algorithm was also tested on the WHAS data set, where it retained important confounders and variables that other methods did not.
The study concludes that the purposeful selection algorithm is a useful tool for variable selection in logistic regression, particularly when the analyst is interested in risk factor modeling rather than just prediction. The algorithm is recommended for use with a confounding level of 15% and a non-candidate inclusion level of 0.15 to improve the retention of meaningful confounders. The algorithm is implemented as a SAS macro and is available for use.