Ensemble Approaches for Robust and Generalizable Short-Term Forecasts of Dengue Fever. A retrospective and prospective evaluation study in over 180 locations around the world

Skyler Wu, Austin Meyer, Leonardo Clemente, Lucas M. Stolerman, Fred Lu, Atreyee Majumder, Rudi Verbeeck, Serge Masyn, Mauricio Santillana

Abstract

Dengue fever, a tropical vector-borne disease, is a leading cause of hospitalization and death in many parts of the world, especially in Asia and Latin America. In places where timely and accurate dengue activity surveillance is available, decision-makers possess valuable information that may allow them to better design and implement public health measures, and improve the allocation of limited public health resources. In addition, robust and reliable near-term forecasts of likely epidemic outcomes may further help anticipate increased demand on healthcare infrastructure and may promote a culture of preparedness. Here, we propose ensemble modeling approaches that combine forecasts produced with a variety of independent mechanistic, statistical, and machine learning component models to forecast reported dengue case counts 1-, 2-, and 3-months ahead of current time at the province level in multiple countries. We assess the ensemble and each component models' monthly predictive ability in a fully out-of-sample and retrospective fashion, in over 180 locations around the world - all provinces of Brazil, Colombia, Malaysia, Mexico, and Thailand, as well as Iquitos, Peru, and San Juan, Puerto Rico - during at least 2-3 years. Additionally, we evaluate ensemble approaches in a multi-model, real-time, and prospective dengue forecasting platform - where issues of data availability and data completeness introduce important limitations - during an 11-month time period in the years 2022 and 2023. We show that our ensemble modeling approaches lead to reliable and robust prediction estimates when compared to baseline estimates produced with available information at the time of prediction. This can be contrasted with the high variability in the forecasting ability of each individual component model, across locations and time. Furthermore, we find that no individual model leads to optimal and robust predictions across time horizons and locations, and while the ensemble models do not always achieve the best prediction performance in any given location, they consistently provide reliable disease estimates - they rank in the top 3 performing models across locations and time periods - both retrospectively and prospectively.

Related publications