A data scientist is an umbrella term that describes people whose main responsibility is leveraging data to help other people (or machines) making more informed decisions. The spectrum of data scientist roles is so broad that I will keep this discussion for my next post. What I really want to focus is on what are the distinctive characteristics of a great data scientist.
Over the years that I have worked with data and analytics I have found that this has almost nothing to do with technical skills. Yes, you read it right. Technical knowledge is a must-have if you want to get hired but that’s just the basic absolutely minimal requirement. The features that make one a great data scientist are mostly non-technical. So what are the 3 key things that distinguish a great data scientist?
This one is so fundamental, it is hard to believe it’s so simple. Every occupation has this curse – people tend to focus on tools, processes or – more generally – emphasize the form over the content. A very good example is the on-going discussion whether R or Python is better for data science and which one will win the beauty contest. Or another one – frequentist vs. Bayesian statistics and why one will become obsolete. Or my favorite – SQL is dead, all data will be stored on NoSQL databases.
These are just instruments that are used to solve problems. A famous American philosopher Abraham Kaplan has coined a concept called the law of the instrument – where he described it “I call it the law of the instrument, and it may be formulated as follows: Give a small boy a hammer, and he will find that everything he encounters needs pounding.” It was popularized by the psychologist Abraham Maslow who described it with the famous phrase “if all you have is a hammer, everything looks like a nail”.
The core function of any data-driven role is solving problems by extracting knowledge from data. A great data scientist first strives to understand the problem at hand, then defines the requirements for the solution to the problem, and only then decides which tools and techniques are best fit for the task. In most business cases, the stakeholders you will interact with do not care about the tools – they only care about answering tough questions and solving problems. Knowing how to select, use and learn tools & techniques is a minimum requirement to becoming a data scientist. A great data scientist knows that understanding the underpinnings of the business case is paramount to data science project success.
A very dangerous state for any data scientist is being stuck in the infinite loop of analytic iterations – drilling in, finding insights, zooming out, looking at a macro level, re-defining hypothesis, zooming in again, looking at the most granular details, then re-thinking and round and round. This is called the analysis-paralysis which is basically over-thinking the process by trying to find the “perfect” solution.
A great data scientist understands that there’s almost never a perfect solution, and a simple imperfect solution delivered on time is much better than a hypothetically perfect one late. In fact the Agile software development methodology seeks to prevent analysis-paralysis by employing adaptive evolutionary planning, early delivery and continuous improvement. The mindset of a great data scientist works in the same way – they think about solving their stakeholder problems and know that they need to be redefined when new insights are uncovered.
The main piece of advice here – don’t overthink and over-analyze the problem. Instead – phase out your analysis or modelling process in stages and get feedback from the problem owners. This way you will ensure that the learning process is continuous and it improves the decisions with each iteration.
As you may see there’s a lot of communication involved in understanding the problem and delivering constant feedback to the stakeholders. But this is just the surface of the importance of communication – a much more important element of this is asking the right questions. Sounds easy, right? It’s not, actually. The data scientists are much more likely to fall into a trap of the curse of knowledge cognitive bias than any other occupation. This bias “occurs when an individual, communicating with other individuals, unknowingly assumes that the others have the background to understand.”
When the data scientist is scoping out a problem together with the stakeholders or presenting the first findings, it is vital to be as explicit and detailed as possible and not assume that stakeholders know as much as you do. This is very hard as the number of assumptions and underlying methodologies that a data scientist makes can be counted in dozens, even hundreds.
The biggest risk is when the stakeholder briefly describes the problem to the data scientist who doesn’t ask enough questions and assumes what the problem is. Then the data scientist builds a solution that seems to solve the described problem. The lack of asking questions and too many assumptions result in a situation where the final solution actually solves a differentproblem than the original one and gives an opposite recommendation or a result.
Great data scientists never assume they know something without in-depth analysis, they think in hypotheses which need to be either rejected or proved, and they ask a lot of questions, even if they are 99.9% sure they know the answer.
The fact of the matter is – you must have the technical skills and a strong basic foundation to be hired as a data scientist – you can read more about the basic requirements in my previous blog post "How to think like a data scientist to become one". This is what’s expected and required from you as the bare minimum.
But to become a truly great data scientist you have to be an ultimate problem solver who is obsessed with understanding the ins and outs of the business case they’re handed.