.

*Summary: This blog is part III of a series showcasing management and analytics of the daily World Covid-19 case/death data published by the Center for Systems Science and Engineering at Johns Hopkins University. Whereas parts I & II focused on U.S. data, part III looks at the World as well. Of particular interest are moving averages of new cases and deaths, in addition to the case fatality rate, the percentage of deaths to total cases. The technology deployed is R driven by its splendid data.table package. Analysts with several months of R experience should benefit from the notebook below.*

A little over a month ago, 45 made the provocative assertion that 99% of covid-19 cases are benign. “We have tested over 40 million people. By so doing, we show cases, 99 percent of which are totally harmless.” Press secretary Kayleigh McEnany attempted to cover the potus by proffering “What the president is noting is that, at the height of this pandemic, we were at 2,500 deaths per day. We are now at a place where, on July Fourth, there were 254; that’s a tenfold decrease in mortality.”

Apparently, the argument was that since daily fatalities were just around 1% of new daily cases at that point, the other 99% were innocuous. Indeed, Dr. Anthony Fauci interpreted the argument that way, countering though: "I'm trying to figure out where the president got that number," Fauci said. "What I think happened is that someone told him that the general mortality is about 1%. And he interpreted, therefore, that 99% is not a problem, when that's obviously not the case."

The percentage of a disease’s cases that result in death is called the case fatality rate and is often computed as simply the ratio of total fatalities to total cases. A better CF rate would tie individual cases to deaths, but that’s generally, as now, impractical. And there’s usually a lag between new cases and fatalities, so a calculation that accounted for that difference would be welcome. In the end, though, the CF is often just computed as cumulative fatalities/cumulative cases.

The current day's Johns Hopkins CSSE data are available for download at midnight CDT daily. For both the U.S. and the World, there are case and death files, each of which has a similar structure. The granularity is geography such as country or county within states. A new column is added each day detailing the cumulative counts for each geography. Data munging revolves on pivoting or melting the data into R data.tables and computing daily counts as differences of successive cumulative records.

One problem with the data for both the U.S. and the World is that cases/fatalities tend to be underreported on weekends which, when coupled with an often one day lag in reporting, produces significantly lower counts on Sunday and Monday. I work around this problem by emphasizing moving averages over daily counts.

After loading and munging the data, I assemble functions to report on cases/deaths using powerful data.table syntax. Some of these functions then feed ggplot visuals that demonstrate the disease's workings over time. The grouping power of data.table allows country/state-level case-death reports to be generated in a few statements.

The supporting platform is a Wintel 10 notebook with 128 GB RAM, along with software JupyterLab 1.2.4 and R 4.0.2. The R data.table, tidyverse, pryr, plyr, fst, feather, and knitr packages are featured, as well as functions from my personal stash, detailed below. Read the entire blog here.

- A History and Timeline of Big Data
- AI voice technology has benefits and limitations
- Strong data governance frameworks are fuel for analytics
- Top 12 most commonly used IoT protocols and standards
- What is the status of quantum computing for business?
- How parallelization works in streaming systems
- An Eggplant automation tool tutorial for Functional, DAI
- Circular economy model enables sustainability and resilience

Posted 29 March 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central