Report 13/02 | Marta Sestelo, Nora M. Villanueva and Javier Roca-Pardiñas
The paper introduces the FWDselect package, a new algorithm for variable selection in regression models. The package is designed to help users select relevant variables and determine the optimal number of variables to include in a model. The method is based on a forward stepwise procedure that selects the best subset of variables using an information criterion, such as cross-validation. Bootstrap techniques are used to determine the minimum number of significant variables. The package includes functions for selecting variables in linear, generalized linear, and generalized additive models. The performance of the method is evaluated through simulation studies and compared with other existing methodologies, including the *regsubsets* function, the *step* function, and the Lasso method. The package is demonstrated using a real-world example of predicting atmospheric SO2 pollution incidents, where the optimal subset of variables is selected to improve the predictive capability of the model. The results show that the proposed method effectively identifies the most relevant variables and determines the minimum number of variables needed for accurate predictions.The paper introduces the FWDselect package, a new algorithm for variable selection in regression models. The package is designed to help users select relevant variables and determine the optimal number of variables to include in a model. The method is based on a forward stepwise procedure that selects the best subset of variables using an information criterion, such as cross-validation. Bootstrap techniques are used to determine the minimum number of significant variables. The package includes functions for selecting variables in linear, generalized linear, and generalized additive models. The performance of the method is evaluated through simulation studies and compared with other existing methodologies, including the *regsubsets* function, the *step* function, and the Lasso method. The package is demonstrated using a real-world example of predicting atmospheric SO2 pollution incidents, where the optimal subset of variables is selected to improve the predictive capability of the model. The results show that the proposed method effectively identifies the most relevant variables and determines the minimum number of variables needed for accurate predictions.