In 1927, W. O. Kermack y A. G. McKendrick described the first mathematical model for infectious diseases using a set of differential equations. This model is called SIR because of the three states one individual can have.
These states are:
The equations that represent these states are as follows:
The boundary conditions are:
The analytical solution of this system can be found in different articles, for example here: arXiv:1403.2160
Instead of that, I will focus in equation (2) to note that it is a Bernoulli equation of the form
Where
The solution for this Bernoulli differential equation is the logistic function, which most general form is this:
In the epidemiologic context, this logistic function represents the accumulative number of infected people as a function of time.
Using this model, it’s possible to fit it to the real data, to obtain the values for the variables, the way to do it consists in minimizing the residuals in the loss function
Because the function to be fitted is not linear, the method to minimize de loss function must be suitable for nonlinear regressions. To do this regression, I used the NLS package for R, which implements the Gauss-Newton algorithm.
The data corresponds to the number of infected people in Spain as a function of time provided by the Ministry of Health.
This graph represents the data.
How to execute the regression using R.
descarga <- read_csv("serie_historica_acumulados.csv",col_types = colsFallecidos = col_double(), Fecha = col_date(format = "%d/%m/%Y"), Hospitalizados = col_double(), Recuperados = col_double(), UCI = col_double(), X8 = col_skip()))
agregados_por_fecha<-descarga %>% group_by(Fecha) %>% summarize(Fallecidos=sum(Fallecidos), Casos=sum(Casos), Hospitalizados=sum(Hospitalizados),UCI=sum(UCI), Recuperados=sum(Recuperados))
s<-seq(1:length(tabla_absolutos$Fecha))
tabla_absolutos["dia"] <- s
logis.m1 <- nls(Casos ~ logis(dia, a, b, c,d), data = agregados_por_fecha, start = list(a = 0, b = 180000, c = 40, d=5))
summary(logis.m1)
Formula: Casos ~ logis(dia, a, b, c, d)
Parameters:
Estimate Std. Error t value Pr(>|t|)
a -2.320e+03 5.344e+02 -4.342 0.000115 ***
b 1.788e+05 2.111e+03 84.706 < 2e-16 ***
c 3.914e+01 1.317e-01 297.217 < 2e-16 ***
d 5.362e+00 1.033e-01 51.920 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
This graph represents the data and the regression curve.
Conclusions:
Comment
Hi Jason.
The data is for cumulative cases. Of course most of these people will recover after a while, but the analysis is focused on the infection process.
Regards
This seems to model a scenario where the number of infected COVID-19 cases will plateau eventually instead of decline back down to zero-which suggests that life will never return back to normal...unless the graphical plot’s y axis represents total cumulative case count since day of inception that does not take into account the number of cases that have recovered, but there is minimal description and/or annotation of the plot to firm up a fixed interpretation.
Thanks Peter. I will check it out. Regards
Could you please share the R codes and the data files in English to replicate the results at:
© 2020 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Upcoming DSC Webinar
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Upcoming DSC Webinar
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central