Subscribe to DSC Newsletter

Guest blog from David Lefkowich at FreeSight

 

Today’s world is split-second. Whether you’re pitching an idea, selling a product, or presenting last quarter’s results, if you don’t grab your audience’s attention in the first moment, you are beat.

 

The problem is that our focus on speed and split-second sound bites doesn’t allow us time to validate the truth or substance behind the flash or hype of words and striking visualizations. We are conditioning ourselves to focus on the story or on the emotion that the presenter is trying to evoke, and less so on challenging the presented results.

 

In the world of data, this means that amazing visualizations could be based on bad data that presents inaccurate results. Following, this would lead to incorrect conclusions that can negatively impact your business.

 

People who have their hands in the data know this. They know how frequently data errors slip through the cracks and find their way into reporting. It takes a meticulous set of eyes and processes to catch them.

 

If the recent increase in volume of blogs and articles about the value of good data preparation is an indicator (any web search on “data prep” will suffice), it appears that the current buzz in the world of data and BI is shifting from the flash of presentation to how to ensure that the data behind the visualizations and stories are accurate. It seems that all of the BI and data visualization vendors are commenting on and trying to move into the not-so-flashy world of data prep – data cleaning, normalization, etc.

 

Data prep is not easy, and despite claims of ease, agility, and automation, none of today’s (or, likely, tomorrow’s) tools will replace every human link in the chain of data prep. Whether during the manual steps of data entry at the source level or the aggregation and reporting steps at the back end, someone eventually will have to have her or his hands in the data itself, doing some cleanup work or writing scripts or the equivalent to do the data cleanup work.

 

We recognize this, and so descriptive words like “self-service” are now in almost all vendors’ marketing materials.

 

Data prep is not an easy nut to crack. However, here are four suggestions that might help you address the issues that can arise as you shift your focus to data prep:

 

Appropriate resources: Recognize that ensuring data accuracy is critical to your business and that it doesn’t happen by itself. Therefore, it is essential that you provide appropriate resources in personnel and data prep tools to do it. Current visualization tools and other means to report the data are fabulous, but first you have to make sure that the data is accurate.

 

Accessible tools:  Keep in mind who on your staff will be doing the actual “hands-in-the-data” work or reporting. What level of technical proficiency is required to use the data prep tool(s) you select?  Ensure that you find a tool or set of tools that is easy to install, access, learn, and use. Otherwise, it simply won’t be used and people will default to what they already know, even if it’s slower, manual or more error prone.

 

Transparency: Make sure that there is appropriate governance, transparency, and auditability within the tool(s) or your process. At any time, you (or your data prep tools) need to be able to answer, with exact precision, the question, “What, exactly, is this number and how did it get here?”

 

Cost: On the extreme end, the value of having accurate data in your reporting is most obvious when you get caught with the impact of having presented inaccurate data. However, that’s a difficult ROI metric to present in the hypothetical. Easier and more palatable measures of ROI are the amount of time saved and number of data errors found and fixed when using a tool to simplify or automate your data cleaning and data prep tasks. Ask your vendor to work with you on a trial so that you can compare the cost/time of doing a task manually (or with your current tool) with the cost/time of doing it with the tool you are considering.

  

David Lefkowich is the VP Sales and Marketing for FreeSight Software, a data integration, cleaning, analysis and reporting tool. (www.freesightweb.com)

Views: 464

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Akira Oyama on October 2, 2015 at 6:03am

Agree.  Getting a good data requires lots of work (often tedious) and knowledge.  A person has to know the different data sources to pull accurate/clean data.  This may involves going outside of what's available in a corporate data warehouse.  Also, a data field is often populated with incorrect information.  Unless a person has the extensive industry experiences, he/she fail to question the integrity of data as he/she does not know any better.  As you said many companies claims of ease, agility, and automation, none of today’s (or, likely, tomorrow’s) tools will replace every human link in the chain of data prep.  I cannot agree more.  We cannot easily replace the deep industry experiences/knowledge required to do a good data preparation with a simple software tool. 

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service