Search any data related posting and you’ll soon be up to your eyeballs in reports on the promise of the new data era, techniques to help build a better data engine or incorporate new data widgets, and infographics and visualizations showing the kinds of insights you can get with the right mix of techniques.
Got it. Data is big. It’s expected to grow in volume to 35 to 45 zetabytes by 2020. As in seven sets of three zeros big.
It’s also transforming business processes big. Everyone is being expected to take advantage of data, no matter what their size, mission, or budget. Decision makers, including outside investors, want the organizations they invest in to use data to show their value beyond what the financial show. And infographics that mean something are becoming the norm.
But with limited resources, what can small and medium enterprises, government shops with minimal budgets, and non-profits do with data? How are you going to choose an approach that combines the right mix of data and tools to provide insight relevant to your organization? And where is it going to come from? Finding the specific, relevant data you need, let alone getting it ready for use with the different tools and technologies that interest you, can become very costly, very quickly if you’re not careful.
When colleagues and I were working on a project early in our data careers, we didn’t think we needed to worry about the data that would fuel our approach and ended up with an overage bill from a data vendor for US$13 million. (Thank goodness for solid contracts!). We worked with another organization who had already invested big money in a technical approach without considering where the data would come from in advance. It was only after the fact they realize how expensive it would be to find the data to implement their idea. Luckily they had the bandwidth to recover, but not without a major impact on their budget and timeline.
Hopefully, you’re thinking about how you can transform your business processes to take full advantage of the data age. And if you are, consider the following about data BEFORE you invest in any particular suite of tools:
1. Think about what you need your data to measure
For many organization, it isn’t easy to build metrics that provide insight that capitalize on outputs from data driven processes. Certainly computer networks and analytic engines can spew out numbers, trend lines, percentage breakdowns, and other statistical reports as needed. But what in those numbers is really going to matter? Building good metrics that work with data driven approaches is as much a knowledge challenge as a technical one. It involves understanding what in the data is important to capture, knowing what kinds of metrics you can get from different combinations of data sources and analytic tools, and building system platforms that will sustain them.
The best metrics will be designed in collaboration with members of technical and operational teams across your organization. They will measure the right mix of performance, effectiveness, usability, and impact whether based on technical, financial, or other operational functions. And they will be based on understanding both the data that you can get, and the outputs and insights that different combinations of tools can give.
2. Make sure your team members understand what different members mean by “data”
There are a lot of assumptions about data and what it means. Technically and mathematically inclined staff will likely think about data in terms of how machine-ready it is, and whether the contents of fields in a database or spreadsheet can be computationally manipulated through your computer systems to get multiple, automated jobs done. Operational and analytical staff will more likely think of data as pieces of information – whether it comes in the form of interactions with or observations of others, audio or video clips, images, words on a page, or numbers in a spreadsheet. This is information in terms of stuff they can research, collect, process and analyze with their own brains using the frameworks, methodologies, and insight they learned as part of their profession practice and education. To many of them, they will not be aware that computers process data much more simply, and that that data has to be broken down into 1’s and 0’s before processing can occur.
The fundamentally different frameworks that people in your organization use to understand data can lead to a lot of confusion about what data driven approaches may be appropriate to your business process. But you need the technical, operational, and analytical understanding of data, as well as the decision makers perspective, if you are going to get full value out of any approach. And you’re also going to need to understand what is doable now and what is still being researched and tested in advanced laboratories.
To that end, it’s important for everyone in your organization to understand data strengths and limitations from different viewpoints and to use that information to figure out what you need to do now, given your budget and resources, and what you can plan to implement in the future.
3. Think about how you already use data in your organization
No doubt you are already using data and have developed a series of techniques and methodologies to capture, analyze, and turn that data into information. Some of your processes and the data that supports them are likely documented and clean and ready to be used in data driven processes as is. Some may need more thought about how to formalize and incorporate them into any data driven approach. But likely as not, many of the data sources and techniques that you already use can be incorporated into a transformative data driven process.
Understanding what you already have to work with – including which parts of your data process can be fully or partially automated, where you will need human inputs for higher-level decision making, and where you have data gaps – will go a long way towards helping you figure out what you can already leverage into a data driven process before you invest.
4. Consider what other data might be out there that you can get to enrich your data process
There are thousands of open data portals, subscription services, and proprietary data sources available to you. But the number of data sources that are relevant to your field, let alone to any specific function or project, will undoubtedly be limited. And depending on the type of data you need, you will likely find that a lot of the data you really need for decision making will need to be processed before it can be used with your tools.
If you’re looking to leverage government funded open data, there is definitely plenty of machine-ready data available, although figuring out what is relevant and aligning it to your needs will be a factor. If marketing, product, or financial data is what you want to use, that also tends to be structured and available, and expensive. Other data that you might need will be proprietary, firewalled, protected by privacy laws, or dependent on human collection, human production, or human interpretation. And certainly there is plenty of unstructured, highly specific and highly localized data that you can play with, assuming you have the bandwidth to incorporate and research the impact of different advanced analytic approaches in your data process.
Exploring what is out there that may be relevant to your decision making before you start will guide the choice in appropriate tools and techniques. It will also help you understand what is doable now with current tools, and what might be doable in the near future as innovations in data technologies emerge out of research labs and come of age.
5. Think about how much it will cost to transform the data so you can use it with your tools of choice
Data isn’t free. Not even the free stuff. For every piece of data, or record that you want to use, there is a cost to preparing data so it can be ingested into, managed by, accessed from, and output to services and tools within your IT system. Depending on the type and quantity you need, what it looks like, how you will need to collect and process it, and what licensing and intellectual property issues you must contend with, costs can add up very, very quickly. And this is before you consider any costs for the technical tools that you might use to exploit the data you need.
Luckily, there are an increasing number of cheaper and cheaper tools you can use to get data into your system – e.g., data collection and mining tools, crowdsourcing options, drones, and opportunities to collect data from humans as they interact with tools and technology. And new technology that makes it easier to take advantage of data are emerging every day. Understanding what it will cost to extract, transform, and load that data into whatever tools you want to use will go a long way into making sure the data driven process you implement will work for your organization.
6. Think big, pilot small, then scale up accordingly
In a world of limited resources, it makes sense to think big about possibilities and test different options on a small scale before you settle on any single approach. By starting small, you might find that the cheapest option is good enough as is or, alternatively, that it needs significant, costly pre-processing to make it usable. You might also find that many solutions that work at small scale will not be effective or cost efficient at larger scales. This may mean your organization processes themselves will need to be adapted, or it may mean that you should look for another option. Testing the kinks before you invest will help you calculate what the ultimate cost for data, tool licenses, and other costs related to building out your system, training users, and establishing new protocols will be.
These steps will take some time and resources to go through. However, a small investment at the beginning to make sure you get the right combination of data and tools to drive your process will save considerably more over the long term. And that is always good for your organization’s bottom line.