2024 | Alessandro Sebastianelli, Dario Spiller, Raquel Carmo, James Wheeler, Artur Nowakowski, Ludmilla Viana Jacobson, Dohyung Kim, Hanoch Barlevi, Zoraya El Raiss Cordero, Felipe J Colón-González, Rachel Lowe, Silvia Liberata Ullo, Rochelle Schneider
This study proposes a reproducible ensemble machine learning (ML) approach to forecast dengue incidence rates (DIR) in Brazil, with a focus on children under 19 years old. The model integrates spatial and temporal information to provide one-month-ahead DIR estimates at the state level. Comparative analyses with a dummy model and ablation studies demonstrate the ensemble model's efficacy across 27 Brazilian Federal Units (FUs). The approach is also successfully transferred to Peru, showcasing its transferability and practical application. The study highlights the importance of integrating advanced analytics into public health operational frameworks, emphasizing collaborative efforts with intergovernmental organizations and public health institutions. The ensemble model combines CatBoost, SVM, and LSTM models, leveraging their unique strengths to capture complex correlations and dynamics. The dataset includes a diverse array of variables, such as eco-climatic, environmental, and population factors, enhancing the model's predictive capabilities. The results show that the ensemble model outperforms single models in most cases, with lower uncertainty and better performance in regions with stable seasonality. The study addresses limitations in handling extreme values and provides a template for effective implementation of advanced analytical methods in public health.This study proposes a reproducible ensemble machine learning (ML) approach to forecast dengue incidence rates (DIR) in Brazil, with a focus on children under 19 years old. The model integrates spatial and temporal information to provide one-month-ahead DIR estimates at the state level. Comparative analyses with a dummy model and ablation studies demonstrate the ensemble model's efficacy across 27 Brazilian Federal Units (FUs). The approach is also successfully transferred to Peru, showcasing its transferability and practical application. The study highlights the importance of integrating advanced analytics into public health operational frameworks, emphasizing collaborative efforts with intergovernmental organizations and public health institutions. The ensemble model combines CatBoost, SVM, and LSTM models, leveraging their unique strengths to capture complex correlations and dynamics. The dataset includes a diverse array of variables, such as eco-climatic, environmental, and population factors, enhancing the model's predictive capabilities. The results show that the ensemble model outperforms single models in most cases, with lower uncertainty and better performance in regions with stable seasonality. The study addresses limitations in handling extreme values and provides a template for effective implementation of advanced analytical methods in public health.