Becoming a data and evidence driven organization provides significant competitive advantage. Speed and accuracy of insight, delivered across any device including smart phones and tablets, means organizations can make better, faster decisions. The goal is for the entire organization become data driven - guided by facts, evidence, statistics and analysis. This is the secret success sauce of Google, Wal Mart, Goldman Sachs, and others.
A critical step to becoming a data and evidence driven organization is mastering Data Fundamentals.
- How to understand the context (what process is the data supporting? First make sure that you have the full picture)
- How to simplify the process to simplify and make the data more manageable (there’s more data than capability to manage)
- How to integrate the data (across channels, applications and devices)
- How to improve quality of data (spot issues & gaps by Profile, Cleanse, Enrich, Match, Scorecard techniques)
- How to leverage the data (information about consumers, markets, opportunities)
- How to store the data (private cloud, public cloud, dedicated storage cloud)
- How to retrieve and visualize the data (especially from mobile devices)
When first planning a data analytics strategy, rather than starting with technology, it is prudent to start from a business perspective and have the conversation between the CIO, data scientists and business people to figure out business objectives and what value can be derived and drive backwards.
Carefully planning all the dimensions of information management for the organization is critical. A well thought out, flexible information architecture is required before attempting to get value from the data.
It is a mistake to manage data by focusing narrowly on high volumes of information (storage, transform/transport, analysis) to the exclusion of the many other dimensions of information management. This can create massive problems down the road. A narrow focus can result in short-sighted decisions that will hamper the information architecture when the need arises to expand and change it to meet changing business needs.
Too narrow a focus will force massive reinvestment in two to three years to address the other dimensions of big data.
Data needs to be architected so data reaches users through a multiplicity of organization data structures, each tailored to the type of content it contains and the type of user who wants to consume it.
A well designed data governance program makes a huge positive difference. Get good professional help to design a well-thought-out data governance system. This is a strategic point for big investment - spend time and resources to get it right. If you screw up data governance the entire data storage and analytics program is compromised. The lack of proper data management and data-quality tools may completely derail what you can achieve with the faster and advanced analytics tools available in the market today.
Any data that directly affects top-line revenue is valuable data, regardless of size. Anything that can help increase revenue or decrease costs is valuable. Yet the large volumes of data collected from many different sources make the data-cleaning process more difficult.
When analyzing big data sets it makes sense to define small, high-value opportunities and use those as a starting point. Define in meticulous detail your business objectives.
As you expand data sources and create the analytical models that will uncover patterns be vigilant about homing in on those patterns that are most important to stated business objectives.
Focus on the volume, variety and velocity of information:
Volume. Many factors contribute to the increase in data volume – transaction-based data stored through the years, text data constantly streaming in from social media, increasing amounts of sensor data being collected, etc. In the past, excessive data volume created a storage issue. But with today's decreasing storage costs, other issues emerge, including how to determine relevance amidst the large volumes of data and how to create value from data that is relevant.
Variety. Data today comes in all types of formats – from traditional databases to hierarchical data stores created by end users and OLAP systems, to text documents, email, meter-collected data, video, audio, stock ticker data and financial transactions. By some estimates, 80 percent of an organization's data is not numeric! But it still must be included in analyses and decision making.
Velocity. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand. RFID tags and smart metering are driving an increasing need to deal with torrents of data in near-real time. Reacting quickly enough to deal with velocity is a challenge to most organizations.
Remember to consider how the consumers of data will use it - especially the data scientists:
Step 1. Organize Data.
Organizing data involves the physical storage and format of data and incorporated best practices in data management.
Step 2. Package Data.
Packaging data involves logically manipulating and joining the underlying raw data into a new representation and package.
Step 3. Deliver Data.
Delivering data involves ensuring that the message the data has is being accessed by those that need to hear it.
Plus, at all steps have answers to these questions.
- What is being created?
- How will it be created?
- Who will be involved in creating it?
- Why is it to be created?
Consider the "8 Levels of Analytics" when organizing and managing data:
1. Standard reports -- It answers the question, "What happened?"
2. Ad Hoc Reports - how many, how often?
3. Query drilldown (or OLAP) - where exactly is the problem?
4. Alerts - what actions are needed now?
5. Statistical analysis -- Why is it happening? What opps am I missing
6. Forecasting - what if these trends continue
7. Predictive modeling - what will happen next?
8. Optimization - how do we do things better? What is the best decision for a complex problem?
Data Complexity: When you deal with huge volumes of data, it comes from multiple sources. It is quite an undertaking to link, match, cleanse and transform data across systems. However, it is necessary to connect and correlate relationships, hierarchies and multiple data linkages or your data can quickly spiral out of control.
Data governance can help you determine how disparate data relates to common definitions and how to systematically integrate structured and unstructured data assets to produce high-quality information that is useful, appropriate and up-to-date.
Data obesity is a controversial issue. Some say save and store all the data for the data scientists to mine for gold. Others say find the relevant data and toss the rest.
Organizations ask interesting questions:
- What if your data volume gets so large and varied you don't know how to deal with it?
- Do you store all your data?
- Do you analyze it all?
- How can you find out which data points are really significant?
- How can you use it to your best advantage?
Considering that storage is cheap, saving and storing all data for data scientists to consider is the best strategy. Yes, not all data will be relevant or useful. But how can you find the data points that matter most? This is difficult and perhaps in many cases impossible.
Further, the data scientists may search for patterns and relationships in the data that you may not know or understand. In other words, data you classify as fat may be considered muscle by data scientists. Err on the side of retaining any and all data.
It may make sense to move data out of the data warehouse and store it elsewhere to get optimal performance.