Subscribe to DSC Newsletter

Eight ways in which data science is helping in the fight against COVID19

Given the scale of its impact and the kind of alteration that it brought into our lives, COVID19 is one of the most unprecedented crises of our times. Although it is not the only pandemic that humanity has been through, COVID19 is occurring in the time of the fourth industrial revolution where everyone and everything is one click way, and where the excess of data and computing has allowed machines to be more intelligent than ever. In the age of deep tech and data, data science is definitely at the core of how we are facing the pandemic and paving the way for a new normal. This article provides a non-exhaustive list of use cases in which data science has been leveraged to provide emergency response during COVID and facilitate post-COVID recovery.

Understanding the virus

Collecting and analyzing medical data dating from the early stages of the virus allowed understanding what the virus is all about. From detecting the symptoms of the virus, to comparing its impact on different profiles of individuals and communities, and evaluating its spread, medical experts were able to pinpoint similarities and differences with other viruses such as the normal flu, SARS and MERS. In particular, computing the death ratio (~1%) and the transmission rate (between 2 and 3) of COVID19 proved critical for deciding on the type of measures required to limit its spread and deadliness.

Identifying communities at risk

Understanding the virus through early-stage data allowed identifying individuals and communities that are the most at risk. For instance, demographic data, coupled with the knowledge that the virus most seriously impacts males aged 70 and above, allows a more customized targeting of countries, counties, cities and towns where this particular group makes a significant share of the local population. Similarly, given that the virus most seriously impacts people with respiratory problems or underlying health conditions such as cancer or diabetes, medical records have been used in multiple countries to liaise with this category of people and their immediate family, in order to bring to a minimum their virus exposure. Beyond demographic data and medical records, socio-economic data proves useful in identifying communities at risk. For instance, physical distancing is harder to enforce in cultures where intergenerational living is common, or in dense zones like refugee camps and slums. In this context, preventing the virus from getting into the community in the first place is the most suitable measure, and is the reason where countries like India opted for strict lockdown in the early stages of the virus despite drastic impact on the economy

Symptoms and contact tracing

 As testing has not been feasible at scale in many countries, relying on individuals to report their COVID19 symptoms and who they have been in contact with could be the most effective replacement for mass testing. In countries like China and South Korea, people willingly report their daily temperatures and any COVID related symptoms such as cough and fever to local government through apps which also track their movement and who they have been in contact with. This granular tracing of the virus and its spread allowed a more effective response and early containment of the virus in places such as Hong Kong, Singapore and South Korea. It equally allowed local authorities to produce visual maps of virus spread at different granularities (city-level, neighborhood-level…). In Europe and the US, the public has been more reluctant to accept such measures given their potential long-term implications on data privacy. However, multiple European countries developed their own versions of tracing apps that prioritize the public good while taking into account the privacy concerns of their citizens and residents. In the meantime, tech  omapnies such as Facebook are leveraging their wide reach to enable individuals to report their symptoms if they choose to, with the goal of exposing this data on an aggregated level to research institutions and the World Health Organization.

Modeling the evolution of the virus

Given the high spread and transfer rates of COVID19, modeling the potential growth of the virus in different geographies and under different scenarios was key for deciding on appropriate containment measures. Namely, using actual infection and death figures per country, forecasting experts modeled the virus evolution ‘curves’ both with and without social distancing measures. Social distancing measures were adopted based on the outcome of this data modeling, particularly given their flattening impact and their ability to shrink the peak. Data modeling was also key for assessing the effectiveness of physical distancing through comparing actual death and infection figures with modeling predictions. Finally, given that infection figures were mostly underestimated due to the lack of mass testing, statistical inference was used to extrapolate the number of real infections based on the number of deaths in different countries and cities.

Cumulative confirmed COVID19 cases in selected countries as of March 19, 2020

Source: Johns Hopkins University

Optimizing medical resources allocation

One major deficiency that the COVID crisis exposed was the lack of medical readiness and the disparity in medical resources distribution locally and globally. In particular, the shortage of medical staff (doctors and nurses) and of medical equipment (ventilators, PPEs, beds) in some of the areas that were the most impacted by COVID19 contributed to a high number of fatalities. A more optimized allocation of medical resources between countries (e.g. different countries in the EU), states or even within the same city could have allowed medical help to be available on time for people who need it the most. At its heart, this optimized allocation is a mathematical optimization problem that uses current and forecasted virus figures to minimize the number of fatalities, while taking into account constraints on local medical resources and the feasibility of resources movement across geographies. While this kind of optimization went largely missing during this pandemic, there is still room to apply it in the instance of future pandemics or in case we are faced with a second peak of the virus 

Evaluating testing, treatment, and vaccine options

As different testing and treatment techniques were/are being investigated, from PCR to anti-gene testing and anti-body plasma injection, deciding on the best technique requires a data-driven comparison of their respective effectiveness on sample populations. In this context, deciding on the most effective testing technique required measuring the precision (false positive/negative rates) of different techniques on populations that are comparable in terms of demographics and rates of infection. On the other hand, deciding on the right treatment or vaccine requires subjecting different control groups to these and making sure there is no overlap between different control groups. Once this is done, rates of infection should be compared between different groups after a similar time period. Infection rates of different control groups should be equally compared to the rate of infection in the normal population that was not subject to any treatment or vaccine.


Automating COVID19 medical diagnosis

The COVID19 crisis has seen many deep tech startups leveraging image and voice recognition to facilitate COVID19 medical diagnosis. Arterys is an example of a startup using medical imaging to diagnose COVID19 infection likelihood from lungs ctsans and xrays. Novoic is an Oxford-based startup developing triaging/screening for suspected COVID-19 cases based on cough sounds and other medical data from thousands of patients. The image and voice recognition models developed by these startups rely on a branch of machine learning, called deep learning, to operate. Inspired by the structure and operation of the human brain, Deep learning models rely on large volumes of unstructured image, text or audio data to shape a multi-layered neural network which will transform an input such as an lung x-ray or a cough sound into a mathematical probability reflecting, for instance, someone’s likelihood of having COVID19.

Accelerating socio-economic emergency response

Despite being a health crisis, COVID19 resulted in a socio-economic crisis which already translated into mass waves of unemployment across the board. Indeed, the US alone has seen more than 36 Million unemployment applications filed in the past 2 months. At the same time, the mentally and physically vulnerable, including the elderly, found themselves more isolated than ever. In the face of this unprecedented economic and social adversity, data science plays a crucial role in matching people to the opportunities and support they desperately need. In this context, we have seen a number of hackathons organized by the World Health Organization, the Chan Zuckerberg initiative as well as many tech companies, NGOs and individuals to accelerate the socio-economic emergency response to the pandemic. Many useful tech products came out of such initiatives, including platforms that match people feeling isolated or sick with volunteers who can take in charge their groceries or medicine shopping, or who can provide mental support throughout the lockdown. Similarly, platforms like helpyourhightstreet were built to connect people to small local businesses, so they can support them by shopping locally online or by paying in advance for services that they can benefit from post lockdown. This support resulted in keeping small business running throughout the lockdown. Larger businesses like Uber or Airbnb which found themselves forced to fire a part of their tech workforce did develop data-driven tools to expose the skills and experience of their ex-employees and facilitate their future job placement. On the other hand, customized training platforms could help many gig economy employees like Uber drivers and brand advertisers who lost their jobs upskill themselves and re-pivot their careers.


Views: 1041

Tags: covid19, data science, emergency response


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on July 11, 2020 at 8:53am

Is there a way to estimate the number of people who were never tested, yet were positive at some point, and either (1) recovered fine, (2) died, (3) recovered but still sick after two months? My guess is that it is well over 20,000,000 for (1), but I have no data to support that guess. 


  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service