When the coronavirus broke out in the Wuhan province of China in December 2019, Taiwan was immediately in harm’s way. Taiwan is less than 100 miles from the China mainland, and as recently as January there were a dozen roundtrip flights between Wuhan and Taipei every week. Yet as of March 19, Taiwan has only 108 cases of the coronavirus and one death. Why?
Taiwan used the data from 2003 SARS (Severe Acute Respiratory Syndrome) epidemic that infected over 8,000 globally and killed 774, including 299 in the state  and again from the H1N1 swine flu in 2009, which killed 56 people. South Korea is another country which has close physical proximity to China as well as heavy traffic and trade with China. But as of March 19, South Korea seems to “be over the Coronavirus hump” with the number of Active Cases in decline (see Figure 1).
Why has Taiwan and South Korea been so successful in containing the coronavirus while other countries seem to struggle? Yes, we all know that they have data about the current coronavirus outbreak. But they also had historical data and insights into outcomes from SARS in 2003 and H1N1 Swine Flu in 2009 that allowed them to model what was likely to happen with the coronavirus in 2020. Yes, Taiwan and South Korea were able to make proactive policy decisions regarding early and broad scale testing, sanitary vigilance, “social distancing” and the quarantining of sick and suspect individuals because they had what some folks would consider to be “obsolete data.”
Don’t Write Off That Obsolete Data Too Quickly…
There are some folks who are very quick to dismiss “old” data as being “obsolete”, that the value of data erodes over time. However, historical data can be just as valuable as current data in trying to ascertain the potential business and operational impact of current situations or events. There are some categories of events where “obsolete data” may be quite valuable.
Annual Events are events such as the Super Bowl, NCAA Basketball March Madness and the World Series that heavily impact commerce in a different city (host city) every year. Since there is a different host city for each of these events each year, there is much that a current host city can learn from the historical data from the previous host cities such as patterns, trends and propensities about airport traffic, train usage, traffic jams and traffic accidents, hotel and restaurant and bar occupancy, crime incidents and police reports, and ambulance requests and hospital visits. For example, the Super Bowl is estimated to bring an additional $300 to $500 million of economic activity to a host city. There’s a lot of visitor, residential, commercial, operational and economic insights that could be gleaned from the data from prior Super Bowls that might be useful in the current host city’s planning (see Figure 2).
Seasonal Events happen at the same time throughout each year and affect everyone (unlike Annual Events which primarily impacts the host city). For example, we know that a large percentage of retail sales happen between Black Friday and Christmas (see Figure 3).
Other holidays, such as Easter, Halloween, Mother’s Day, and the 4th of July, all exhibit similar seasonality patterns although the products and places impacted can vary widely such as flowers, greeting cards and restaurants on Mother’s Day, candy and costumes on Halloween, and turkey, pumpkin pies and sporting events on Thanksgiving.
4-Year Events include the US Presidential Elections and the Summer and Winter Olympics. The historical data from previous 4-year events and the associated consumption propensities, patterns, tendencies and preferences, could be invaluable in making decisions about the operations of your business during one of these events. These 4-year Events also tend to last over an extended time period – months not days – and that can impact nearly every aspect of your organization’s value chain including promotions, pricing, merchandising, marketing, procurement, manufacturing, inventory management, logistics and staffing (see Figure 4).
Proxy Events are events that occur from which a proxy – a similar event or use case that can be used to represent the event – can be used to estimate impact and make operational and performance predictions. Historical data can also be invaluable as a proxy for predicting the impact for completely new events, like a new product launch.
That’s exactly what we did in the Hitachi Vantara Project Champagne use case to predict which customers would benefit from our new VSP5000 storage product. Because we did not have any customer usage data from our new product, we used as a proxy the customers’ usage patterns from a similar storage product to 1) create a propensity-to-buy score, 2) a survival model as to when the customer’s demand patterns would indicate a time to buy and 3) a recommendation engine to identify the right product features for the customer given their usage patterns. The predicted behaviors allowed us to focus Sales and Marketing resources on those customers who would benefit the most from the new product (and not waste time and resources on customers for whom the product was not a good fit).
Random Events include tornados, hurricanes and pandemics. While these events are random, there is a growing number of such cases and a growing body of data from which inferences about personal and economics impact can be surmised. Just like what we are witnessing today with the Coronavirus and potentially proxy data associated with SARS and the H1N1 Swine Flu.
Obsolete Data Summary
I know that the cool thing to discuss in conferences, articles and blogs is real-time data; using sensors to capture humans and/or devices in the act of doing something live. And yes, that’s cool as there are lots of use cases that can take advantage of real-time data like malware detection, security violations, identity theft, credit card fraud and autonomous vehicles. Those, indeed, are the exciting use cases!
However, do not write off too quickly that old, obsolete data. There are a multitude of use cases – annual events, seasonal events, 4-year events, proxy events and random events – where “obsolete” historical data can be instrumental in helping make informed operational and business decisions.
With respect to data, one use case’s noise is another use case’s signal.