Why So Many ‘Fake’ Data Scientists?

Have you noticed how many people are suddenly calling themselves data scientists? Your neighbour, that gal you met at a cocktail party — even your accountant has had his business cards changed!


There are so many people out there that suddenly call themselves ‘data scientists’ because it is the latest fad. The Harvard Business Review even called it the sexiest job of the 21st century! But in fact, many calling themselves data scientists are lacking the full skill set I would expect were I in charge of hiring a data scientist.

What I see is many business analysts that haven’t even got any understanding of big data technology or programming languages call themselves data scientists. Then there are programmers from the IT function who understand programming but lack the business skills, analytics skills or creativity needed to be a true data scientist.

Part of the problem here is simple supply and demand economics: There simply aren’t enough true data scientists out there to fill the need, and so less qualified (or not qualified at all!) candidates make it into the ranks.

Second is that the role of a data scientist is often ill-defined within the field and even within a single company.  People throw the term around to mean everything from a data engineer (the person responsible for creating the software “plumbing” that collects and stores the data) to statisticians who merely crunch the numbers.

A true data scientist is so much more. In my experience, a data scientist is:

  • multidisciplinary. I have seen many companies try to narrow their recruiting by searching for only candidates who have a Phd in mathematics, but in truth, a good data scientist could come from a variety of backgrounds — and may not necessarily have an advanced degree in any of them.
  • business savvy.  If a candidate does not have much business experience, the company must compensate by pairing him or her with someone who does.
  • analytical. A good data scientist must be naturally analytical and have a strong ability to spot patterns.
  • good at visual communications. Anyone can make a chart or graph; it takes someone who understands visual communications to create a representation of data that tells the story the audience needs to hear.
  • versed in computer science. Professionals who are familiar with Hadoop, Java, Python, etc. are in high demand. If your candidate is not expert in these tools, he or she should be paired with a data engineer who is.
  • creative. Creativity is vital for a data scientist, who needs to be able to look beyond a particular set of numbers, beyond even the company’s data sets to discover answers to questions — and perhaps even pose new questions.
  • able to add significant value to data. If someone only presents the data, he or she is a statistician, not a data scientist. Data scientists offer great additional value over data through insights and analysis.
  • a storyteller. In the end, data is useless without context. It is the data scientist’s job to provide that context, to tell a story with the data that provides value to the company.

If you can find a candidate with all of these traits — or most of them with the ability and desire to grow — then you’ve found someone who can deliver incredible value to your company, your systems, and your field.

But skimp on any of these traits, and you run the risk of hiring an imposter, someone just hoping to ride the data sciences bubble until it bursts.

What would you add to this list? I’d love to hear your thoughts in the comments below.

About : Bernard Marr is a globally recognized expert in analytics and big data. He helps companies manage, measure, analyze and improve performance using data.

His new book is: Big Data: Using Smart Big Data, Analytics and Metrics To Make Better Decisions and Improve Performance You can read a free sample chapter here

Other Bernard Marr articles 

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 53869


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vassilios Rendoumis on July 26, 2015 at 2:18am

The post title seems a reasonable good question. But why so many fake Data Science Job adds?

Comment by Kevin Wang on July 25, 2015 at 2:02am

Bernard I've read many of your interesting articles but felt this one is not really accurate and grossly underestimate what a statistician does. A good applied statistician would have most if not all of the skillsets you defined for a "data scientist". In fact, a statistician does not just "merely crunch the numbers" nor "only presents the data". As a statistician who worked in the management consulting sector for a few years, I certainly didn't just "present the data" to my clients. I had to be "business savvy", "a story teller" and being "multidisciplinary" is also expected in a statistician.

Now....whether that also makes me a data scientist is a different story, but I'd like to think that I'm proud to be a statistician who does much more than just present the data!

PS -- I actually tried to post this on LinkedIn but had trouble commenting on your link on LinkedIn.

Comment by Orlando Vivin Tucker on July 22, 2015 at 6:14am

This was a very good post!  I have noticed much of this myself.  I especially like the part that says that a true data scientist can not only number crunch like a mere statistician, but can tell a story from that data that the company can benefit from.  I also think that a good data scientist can run away with prescriptive and predictive analytics.  We need more data scientists out there, but at the same time, we need them to be competent in their profession.

Comment by Life Skipper on July 7, 2015 at 9:48pm

very interesting and timely post. 

My 2 cents as a new data scientist (not so new in life though) and from my experience so far,(admittedly not a big one as a professional) :

I started studying as a hobby statistics,and after finishing a few courses ,i realised that i lack the math skills to go forward.

I had to go back and relearn calculus,before i really started understanding the underlying process in what i was doing in practice.

Fact is ,i ve seen so many "analyses" from "data scientists" and "statisticians" ,which if anyone bothered to research a litlle ,would discover the lack of any understanding of the subject matter.

I ve seen from "official" bodies reports that could have been made by some high school student ,and not a good one at that.

In all,what i see is that people who learn a few ways to manage data and make a table and throw some number in a machine to get some results dont really know what the details are and how they come into play in every case and scenario

Analyses about political  positions and beliefs is an example that jumps up first.

Living in Greece i was subjected to a vast amount of lies,of propaganda that was targeting the "average" citizen.

They report for citizens they consider illiterate in statistics and they tried and tried to cover up truths,and distort reality.And all ,because in my view,they lack the skills and experience to understand what these predictions do to society.Not only it is an irresponsible and maybe even illegal ,but it shows the lack of any real understanding of anything beyond some numbers on a paper.

I started in statistics ,pushed from the need to confront these lies and try to balance the public dialogue with informed opinions based on real facts and produce analyses that take into consideration all aspects that influence a situation.Without keeping anything out with intention to produce a false image or make a prediction with the sole purpose of directing public opinion.These are a danger not only to statistics and science ,but to society in general.

You wouldnt want a friend who would sell you in exchange for a few dollars as a slave.would you?

Why would you accept their analysis and advice then?why would you consider their predictions at all then?

If you know they are biased already.They are lies .Plain lies for a purpose.

So this is what i believe is a vital aspect of this conversation.To put humanity back into science.

To not become data nazis.(maybe overusing the nazi term this way to describe the situation)

To learn to take the human factor as the center of the analysis.

People who believe they know,or pretend they know,are contributing to an ideology of a new kind of "racism: this against the poor.

And yes ,it might be a problem these self declared data scientists for those of us who are after a job,but in my mind,this is not as important as human dignity ,self respect and humanism ,qualities that every scientist should have on top of his "scientific mind"

And yes i am a "leftie" as you d say in the western world.Not a communist though.not by a mile.I just want an equal ,shared,just world,as most people do (i hope thats still the case).

Sorry if i took you a bit of course,but i thought it would be interesting to input the human factor in a different way and talk about (data (and all other kinds of)) scientists responsibility towards our fellow humans,those we supposedly want to live our lives with in peace and prosperity....

Maybe i got it all wrong.Maybe we should all be trying to eat each other.If thats the case,please eat me first.:)

best regards to all


half way through data science path 

PS:Noticeable is the ease with which ,some people who sit on teaching positions in universities around the world,(not in major ones usually) have each one created their own measures corrections and scales,and ways to go about what we d call analysis.The age old arguing between bayesian frequentist approach is just an example of what i m talking about.So if one wants to call him/herself scientist ,in my view should have a sound understanding of all approaches and take them all into consideration if one wants to be scientifically acting.:):)

If there is an artistic element in data science ,this in my view is the ability to perform analysis and prediction that is actually relevant and seeks answers to better lives for as many as possible without hurting anyone ,not make them less humane

.Any other purpose ,for me science does not have.(to explain for the explanation is nothing,if this explanation does not better human life....:):)

Sorry for taking up so much space!!:):)

again all the best!


Comment by Gagan Mehra on July 6, 2015 at 7:10pm

3 questions to ask before hiring a data scientist - https://www.linkedin.com/pulse/3-questions-ask-before-hiring-data-s...

Comment by Martyn Jones on July 5, 2015 at 2:44am
5 – Big Data Certification

By 2016 there will be global demand for 30 billion Big Data professionals. Are you prepared to cash in on that inevitability? No? Then consider this.

One of my best friends makes his living as a completely phony Big Data Scientist. For two hundred bucks he can make you a Data Scientist or a Big Data guru. Some guys give you an education but this guy gives you immediate access to high paying jobs, sex and a life in the city. Moreover, for an extra 250 bucks you can also become a certified Big Data Trainer, which will allow you to do unto others what has been done unto you.

Comment by Cahlen Humphreys on July 4, 2015 at 5:32am

I'm willing to bet that roles iron out within the next five years, especially when academic institutions start implementing graduate degrees in data science here in the US.  Right now, you're correct in that the roles are ambiguous, and in many cases depending on who you work for your role changes with each contract.  I suppose that is the great thing about consulting -- one week you're a software engineer, then big data engineer, then data scientist, then analyst, then solutions architect.  I think it makes for an exciting thing though.

Comment by Richard Ordowich on July 3, 2015 at 1:40am

With no agreed to definition or qualifications of a data scientist how can we debate who is and who is not a data scientist?

Like data, this term is man made. Its origins appear to be marketing articles. We seem to spend a lot of time creating new titles within the data sphere, data steward, data owners, Chief Data Officer, Data Scientist etc.  and then spending time arguing its merits and interpretations. This is the same with data.

Little effort goes into the design of the data such as the semantics and pragmatics but a lot of time is spent questioning and interpreting the data. We've created a perpetual data production machine. Data that begets data. Perhaps we should focus on teaching data literacy to everyone rather than bestowing titles.

Comment by Max Galka on July 2, 2015 at 9:49am

Think this is a great debate, and an important one for the industry. But I think there is a deeper issue at play.

One could write a very similar article about the use of the term "big data," which now gets used as a synonym for "a lot of data." The same goes for "data mining." It originally referred to methods of analyzing data. As it is commonly used today, it is just a fancy term for "finding data."

Somehow every "data" term ends up getting vulgarized to the point of losing its meaning.

Comment by Bruce Wasson on July 2, 2015 at 9:11am

Good post. Our preference for “Hadoop or nothing” is the likely cause of this symptom. If we cared about our clients a bit more, we’d probably be more prepared to democratize our client's data by making NoSQL and relational databases integral to their Big Data technology stack, at least until all-Hadoop matured into the better way. Doing so would free our clients to extract more value from their Big Data themselves, at a fraction of the cost that comes from the usual way.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service