I was deep into a presentation at a major retailer. In the darkened room, a lone hand shot up. “John, we spend 80% of our time on data load and prep. Only 20% is used to produce analytics. We don’t like that ratio.”
The speaker was right. About 80% of the analytics process is spent on data preparation and loading. Numerous examples come to mind. I remember a project for an auto insurance company using telematics and driver behavior. The one-off code to prepare the data took three days to write. That’s typical.
Time spent in data load and preparation and writing lines of code do nothing for the bottom line. Reducing these aspects of analytics saves enterprises money and assets. Wall Street doesn’t measure value by the number of lines of code an organization produces. That’s no way to work.
The three most expensive letters with regard to data are E-T-L (extract, transform, load). Traditional data movement approaches can choke data sources and the network. We must do better. This requires tools that speed up data movement and transformation and techniques like change data capture. Data should be moved through a parallel network or via network data integration. It should be highly available and in a connected environment that is always on and always on time.
Another issue is code surface area. If your analytic platform is based on low level languages like Java or C++, your analytic agility is reduced significantly. The best way to reduce complexity is to reduce code surface area. The cost to develop, change, test, and support an analytic is directly related to the lines of code to construct it. Fewer lines of code, or better, a set of simple commands, enable you to be analytically nimble.
Some define big data as data of such size that I cannot afford to land and store that data. I define big data as data of such size and complexity that you are unable to find analytic value from it in an expedient way to positively impact business. When analytics is slow and complex, it is treated as a tangent and kept in a lab off to the side; that’s not successful. When use of analytics is fast, easy and responsive, it can be leveraged widely. Further, analytics results can impact business operations and have impact on things that matter.
Applying analytics is much harder than creating them. To ensure an analytic is successful takes changes to people and processes that are far more difficult than technology changes. Changing people and process enables an organization to apply analytics to real world business problems. Having more time to make these changes is paramount to making analytic output valuable.
Flipping the 80/20 Rule
So how do we change the 80/20 rule? We use tools to increase the effectiveness and the efficiency of data preparation. That gets us part of the way there, reducing the 80% by making the job less onerous and manual.
We need to work both sides of the equation, to increase the efficiency of data prep and expand the number of people using data. We must democratize analytics by lowering the barriers to advanced analytical topics. With greater efficiency and better tools for more people, we shift the conversation from one that focuses 80% on esoteric technology to one that invests 80% into making things happen.
We are integrating more data sources and analytics techniques than ever before, with a Wild West of tools and techniques: one for text analytics, another for cluster analysis, another for affinity analysis, and another for pathing. With different departments using different tools, you wind up with governance problems that even the best sheriff can’t solve. Companies need a central strategy to flip the 80/20 rule. Think of it as a race with the competition; the first one to flip it wins. And you won’t flip it without centralizing your strategy and rationalizing data across the organization.
Flipping the rule will mean more data-driven decisions. Today effort is invested in creating analytics that support high-value decisions. Reducing the grunt work in data preparation and increasing the use of analytics by more stakeholders help make that $5 million dollar decision the right one, and also help make 10,000 $500 dollar decisions better.
Originally posted on Forbes, with a different picture. The new picture was added by the editor.