Subscribe to DSC Newsletter

As with any business initiative, a Big Data project involves an element of risk. Any project can fail for any number of reasons - bad management, under-budgeting or a lack of relevant skills. However Big Data projects, due to their nature, bring their own specific risks.

Due to the advanced technology often needed, and the relative newness of the skillsets required to truly “think Big” (or as I prefer to say, “think Smart”) with data, care must be taken at every step to ensure you don’t stumble into pitfalls which could lead to wasted time and money, or even legal hot water!

Business people are used to taking risks – assessing those risks and safeguarding against them comes naturally, or we don’t stay in business for long! So there’s no need to be scared of Big Data. But of course we always need to be aware of dangers that could potentially arise if we fail to cover all of the bases.

Here are the five biggest risks of Big Data projects – a simple checklist that should be taken into account in any strategy you are developing.

Security

An obvious one, and often something that is uppermost in our minds when we are considering the logistics of data collection and analysis. Data theft is a rampant and growing area of crime – and attacks are getting bigger and more damaging. In fact five of the six most damaging data thefts of all time (eBay, JP Morgan Chase, Adobe, Target and Everote) were carried out within the last two years. The bigger your data, the bigger the target it presents to criminals with the tools to steal and sell it. In the case of Target, hackers stole credit and debit card information of 40 million customers, as well as personal identifying information such as email and geographical addresses of up to 110 million. Last year a US court ruled that everyone affected could claim up to $10,000 in compensation, leaving Target facing a hefty bill!

Privacy

Closely related to the issue of security is privacy. But as well as ensuring personal data is safe from criminals, you need to be sure that the sensitive information you are storing and collecting isn’t going to be divulged through less malevolent but equally damaging misuse by yourself or people you have delegated responsibility for analyzing and reporting on it. Failing to follow applicable data protection laws can lead to expensive lawsuits and even prison, depending on what sort of data you are using and what jurisdiction you are in. Last year, private hire and car sharing service Uber stirred up controversy when one of its executives was caught using the service’s “God mode” to track the movements of BuzzFeed journalist Johana Bhuiyan.

Costs

Data collection, aggregation, storage, analysis and reporting all cost money. On top of this there will be compliancy costs – to avoid falling foul on the issues I raised in the previous point. This can be mitigated against by careful budgeting during the planning stages, but getting it wrong at that point can lead to spiralling costs, potentially negating any value added to your bottom line by your data-driven initiative. This is why “starting with strategy” is so vital. A well-developed strategy will clearly set out what you intent to achieve and the benefits that can be gained, so they can be balanced against the resources allocated to the project. One bank that I worked with was worried about the costs of storing and maintaining all the data it was collecting to the point that it was considering pulling the plug on one particular analytics project, as the costs looked likely to exceed any potential savings. By identifying and eliminating irrelevant data from their project they were able to bring costs back under control and achieve their objectives.

Bad Analytics

Aka “getting it wrong”. Misinterpreting the patterns shown by your data and drawing causal links where there is in fact merely random coincidence is an obvious pitfall. Sales data may show a rise following, say, a major sporting event, prompting you to draw a link between sports fans and your products or services – when in fact the rise is purely down to there being more people in town, and the rise would be equally dramatic after a large live music event. In addition care must be taken to avoid confirmation bias – easily imposed when an analyst comes to a project with pre-set ideas about what they are looking for, and by a psychological phenomenon is blinded to insights from the data which go against these preconceived notions. The only way to mitigate against this is to ensure you are implementing all of the available best practice procedures from top to bottom throughout your project. Google’s Flu Trends project serves as a good example. Designed to produce accurate maps of flu outbreaks based on the searches being made by Google users, at first it provided compelling results. However as time went on, its predictions began to diverge increasingly from reality. It turned out that the algorithms behind the project just weren’t accurate enough to pick up anomalies such as the 2009 H1N1 pandemic, vastly reducing the value that could be gained from them.

Bad Data

I’ve come across many data projects which immediately start off on the wrong foot by collecting irrelevant, out of date or erroneous data. Again this usually comes down to insufficient time being spent on designing the project strategy. The Big Data gold rush has led to a “collect everything and think about analyzing it later” approach at many organizations. This not only adds to the growing cost of storing the data and ensuring compliancy, it leads to large amounts of data which can become outdated very quickly. The real danger here is falling behind your competition - if you are not analyzing the right data you won’t be drawing the right insights which will provide value. Meanwhile, your competitors will most likely be running their own data projects, and if they’re getting it right, they’ll take the lead. A healthcare client I recently worked with created a 217-page report for senior management. A lot of the data in the report would have been good – but it was drowned out by irrelevant background noise. Working with them I was able to show them how to cut the report down to 20 pages, mostly infographics, which clearly showed the relevant data while omitting a lot of the noise.

That’s just a simple checklist of the risks that every Big Data project needs to take into account, before one cent is spent on infrastructure or data collecting. This article certainly isn’t meant to scare anyone – I firmly believe that businesses of all sizes should be unafraid to engage wholeheartedly with Big Data projects. At this stage, if they don’t, they run the serious risk of being left behind! But it always pays to be aware of the risks and to enter the fray with your eyes wide open. 

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 3748

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service