Report 13/02 | Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardiñas
The paper presents FWDselect, a new variable selection algorithm for regression models. The method uses a forward stepwise procedure to select the best subset of variables based on an information criterion, and applies bootstrap resampling to determine the number of covariates to include in the model. The algorithm is implemented in R and is tested on pollution data to predict SO₂ pollution incidents. The method is compared with other existing techniques, including regsubsets, step, and Lasso, and is shown to perform well in simulation studies. The FWDselect package provides functions for variable selection in linear, generalized linear, and generalized additive models, and allows for both numerical and graphical outputs. The method is demonstrated using pollution data, where it successfully identifies the best temporal instants for predicting SO₂ pollution episodes. The results show that including a small number of relevant variables can significantly improve prediction accuracy, while including too many variables can degrade model performance. The paper concludes that FWDselect offers a practical and efficient solution for variable selection in regression models.The paper presents FWDselect, a new variable selection algorithm for regression models. The method uses a forward stepwise procedure to select the best subset of variables based on an information criterion, and applies bootstrap resampling to determine the number of covariates to include in the model. The algorithm is implemented in R and is tested on pollution data to predict SO₂ pollution incidents. The method is compared with other existing techniques, including regsubsets, step, and Lasso, and is shown to perform well in simulation studies. The FWDselect package provides functions for variable selection in linear, generalized linear, and generalized additive models, and allows for both numerical and graphical outputs. The method is demonstrated using pollution data, where it successfully identifies the best temporal instants for predicting SO₂ pollution episodes. The results show that including a small number of relevant variables can significantly improve prediction accuracy, while including too many variables can degrade model performance. The paper concludes that FWDselect offers a practical and efficient solution for variable selection in regression models.