Subscribe to DSC Newsletter

With the onset of the COVID-19 outbreak, we look at the data and use a simple model inspired by China’s example to predict when the outbreak will abate in various countries.

Forecasts: https://www.codoma.tech/blog/covid19-forecast/ (updated daily)

The forecasts tell us that we, every human being, need to keep following the measures (social distancing, diligent heygenic habits, etc.) for around 70 days for the pandemic to abate.

Let’s stick together (metaphorically, remember social distancing) and stop the pandemic! Stay safe everyone!

Method

Since China was the first country hit, they took strong measures in response to the pandemic. Although the response was criticized in the beginning, most countries now followed suit. China therefore serves as a plausible reference to how the outbreak develops in various countries.

Examining the data, the number of new cases in China looked very similar to the known sigmoid function.

We follow an (admittedly simple) model: we fit a parameterized sigmoid function over the data from each country to predict:

  • response date: the date when the country effectively started responding to the outbreak
  • recession date: the date when for the first time the number of new cases is below 5 / day

Note: we created our own metrics here, please enlighten us if you have sounder ones.

More Details

(this section was added responding to Peter Schmidt's good question)

Using China as a reference means that it is assumed all countries will eventually control the pandemic in a similar fashion to China, i.e. inflection points will come for certain (if they haven't yet). The assumption is plausible (imo) because most countries now do pretty much the same as what China did two months ago.

Using this reasoning, we try to fit a sigmoid function with a shift in x (different countries start responding at different times) and 2 scaling factors in x and y. This allows calibrating the time span it takes a country to control the pandemic and the scale of infection. The params which gives best estimate of the new cases in the past n days are the one selected and used to extrapolate the number of new cases.

The optimal shift allows us to estimate when a country started responding to the disease. To intuitively evaluate the model, we looked 3 countries with known response date, it is not off by much:

China: actual = 23-jan, estimated = 26-jan   (3 days off)
South Korea: actual = 25-feb, estimated = 20-feb (5 days off, people started self-isolating from 20-feb though)
Germany: actual = 11-mar, estimated = 9-mar (2 days off)


The model is obviously simple and susceptible to errors (especially for countries with low numbers), but it is somewhat useful given the limited information we have so far.

Views: 999

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Mohamed A. Maksoud on March 23, 2020 at 11:45am

Peter,

Using China as a reference means that it is assumed all countries will eventually control the pandemic in a similar fashion to China, i.e. inflection points will come for certain (if they haven't yet). The assumption is plausible (imo) because most countries now do pretty much the same as what China did two months ago.

Using this reasoning, we try to fit a sigmoid function with a shift in x (different countries start responding at different times) and 2 scaling factors in x and y. This allows calibrating the time span it takes a country to control the pandemic and the scale of infection. The params which gives best estimate of the new cases in the past n days are the one selected and used to extrapolate the number of new cases.

The optimal shift allows us to estimate when a country started responding to the disease. To intuitively evaluate the model, we looked 3 countries with known response date, it is not off by much:

China: actual = 23-jan, estimated = 26-jan   (3 days off)
South Korea: actual = 25-feb, estimated = 20-feb (5 days off, people started self-isolating from 20-feb though)
Germany: actual - 11-mar, estimated 9-mar (2 days off)


The model is obviously simple and susceptible to errors (especially for countries with low numbers), but it is somewhat useful given the limited information we have so far.

Comment by Peter Schmidt on March 23, 2020 at 10:33am

Interesting work.  Could you elaborate more on technique?  If you used China as the example, they have at least made it through the two inflection points and show a sigmoidal curve.  However, the Italy or the US for example have not.  So can you share in detail how you made the fits and then applied the predictions?

Comment by Mohamed A. Maksoud on March 23, 2020 at 10:09am
New cases are the number of confirmed cases, so yes they are depenent on the testing coverage.
Whether the testing is censored, there is no accessible data about testing strategy in all countries of the world. China applied aggressive testing in all gatherings whereas South Korea did targeted testing on everyone suspected to have been in contact with an infected person. I wish there were more accessible data to factor in the forecasts.
In the end, both countries ended up with a sigmoid curve more or less, the assumption behind these prediction is that all countries are going to respond to the virus (this is the case) and that their approach only affects the curves stretch in X and Y, but still follow a sigmoid function.
Comment by Patrick Stroh on March 23, 2020 at 3:57am

How are new cases measured?  Are they dependent on how widespread testing is?  Is the testing censored (only those with strong symptoms)?  How does China testing compare to US testing, etc.?

Videos

  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service