ADA Research Group

Automated Machine Learning for COVID-19 Forecasting


Accurate forecasting of the spread of pandemics is necessary for policy makers to adequately respond to it. As a response to the COVID-19 outbreak, various sophisticated epidemic and machine learning models were deployed to take on this task. These models, however, rely on expert knowledge, carefully selected architectures and detailed data that is often only available for specific regions. Automated machine learning (AutoML) tackles this issue by automatically creating pipelines in a data-driven manner, resulting in high quality predictions. In this work we adapt the AutoML framework of auto-sklearn to the time series forecasting task. We compare two methods, a multi-output and a repeated single-output, for multi-step-ahead forecasting. We also study the usefulness of open mobility data sets published by Apple and Google to complement the open incidence data set of the ECDC. To combat concept drift, we experiment with three drift adaptation strategies, refitting our models on part of the data, the full data, or retraining the models completely. We compare our methods with six baselines over two sets, a global set composed of 58 countries around the world and a European set composed of 26 countries. We evaluate and compare the performance of methods in early, intermediate and late forecasting scenarios. We find that a simple persistence baseline is a strong competitor for this task. Our results over three scenarios separated in time show that the comparative performance of our models increase as more data becomes available. In the late forecasting scenario, our best method, a multi-output ensemble refitted on recent data and using Google mobility data alongside incidence data, outperforms all other methods and baselines for each country.



AutoML for COVID-19 Forecasting is published on GitHub or you can download this zip file (May 2021): []

The original AutoML system used is auto-sklearn


The experimental evaluation made use of the following data sources: