Home

Top mistakes data scientists make

The rise of the data scientists continues and the social media is filled with success stories – but what about those who fail? There are no cover articles praising the fails of the many data scientists that don’t live up to the hype and don’t meet the needs of their stakeholders.

The job of the data scientist is solving problems. And some data scientists can’t solve them. They either don’t know how to, or are obsessed about the technology part of the craft and forget what the job is all about. Some get frustrated that “those business people” are asking them to do “simple trivial data tasks” while they’re working on something “really important and complex”. There are many ways a data scientist can fail – here’s a summary of top three mistakes that is a straight path towards failure.

Mistake #1 – Less communication is better

What I have seen in the great data scientists is that they are communicators first and data geeks second. A very common mistake that data scientists make is avoiding business people at all costs. This means that they try to maintain a minimal amount of interactions with them in order to go back and do “cool geek stuff”. Now I really like the geeky part of work, I do. That’s why I got into the field in the first place. But we are hired to solve problems and without communication those problems won’t be solved. Data scientists must follow up on the progress of their data analysis and collect feedback from their peers all the time, especially when they don’t find anything peculiar – maybe that’s good news? Not only collecting feedback is important but also adjusting the analysis and assumptions based on the feedback. This is the “science” in the “data science” – scientific method is founded on the principle of redefining hypothesis based on new data. And the only way to collect and interpret new data is by communicating with your stakeholders who have defined the hypothesis in the first place!

Mistake #2 – Delaying simple data requests from business teams

This is a golden one – simple data requests drive data scientists crazy (“it’s just 30 lines of SQL code, yuck!”). And this is where they fail. While it might be very simple for a data scientist – the data might just have become available and it might solve years’ worth of a problem. But the data scientist tends to think like an engineer (“trust me, I’m an engineer”) and tries to build scalable architectures to support long-term solutions. But – the business doesn’t care about the architectures, scale, engineering – they only care about the insights, actionable insights. If you’re not providing them – you fail in their eyes. And, well – they do the sales, so their decisions matter. If you don’t help improving those decisions – you’re just a sunk cost and finance theory has some pretty rough advice how to deal with it. Don’t ignore the simple requests. First make sure they support a decision and that decision will improve the business if it has the data – and when you do, swallow your pride and run those trivial 30 lines of SQL code – you’ll turn to a high ROI unit instead of a sunk cost.

Mistake #3 – Preference for complex solution over easy one

Very costly mistake. It’s actually a whole mantra that’s been built around the data scientist occupation. Depiction of data scientists as ultimate geniuses who can code, do math and statistics, and understand business better than most has done a big disfavor. The expectation becomes a perverse one – the data scientists think that they need to solve the problems by applying the top-of-the-line statistical and computer science methods. Ultimately you get to a situation where the junior data scientists think that everything can be solved with deep learning and don’t know how to explore the data because the industry sold the complexity obsession to them. Basic data exploration and visualization are the main tools for a data scientist and you will spend most of your time exploring data. Not building machine learning models – unless you’re hired to exclusively do so. Not building back-end architectures that scale. Not writing a 10-page in-depth hypothesis testing research for a simple business question. Unless you’re hired for that or were specifically asked to do that. Your main role is discovering actionable insights and sharing them as recommendations with your stakeholders. Don’t over-complicate the already overly complex field with too many superstitions.

So how do I succeed as a data scientist?

As with every field there are many ways so succeed and fail – and many mistakes need to be made to understand which are which – but the fundamental lessons can be learned without trial-and-error. What’s utmost important is being passionate about the problems and building solutions for your stakeholders instead of obsessing over tools and geeky stuff. Unless your role is an engineering one where you are not required to interact with other human beings, you will have to deal with human-to-human communication and run very simple – trivial, in your mind! – code that delivers a non-attractive 3×3 data table. But sometimes the simple is better, and it’s all that is needed – “everything should be made as simple as possible, but not simple” as one pretty famous scientist Albert Einstein once said.

For original article, click here.

Tags: