Subscribe to DSC Newsletter

What is Wrong with the Definition of Data Science

This is another provocative KDNuggets blog post: the data scientist is reduced to three circles, missing the biggest, most important one that encompasses all three of them.. 

3 Areas of Data Science: Statistics, Computing, and Database

My answer: This Venn diagram misses the most important circle: domain expertise / business acumen. You can be a data scientist without computer science, statistics or data base (thought it would be very difficult). You can't be a data scientist without deep domain expertise and horizontal business knowledge.

Steven Miller wrote: I met Jeremy Howard formerly CEO of Kaggle back in October. He said that only one winner of a Kaggle competition was a domain expert. Perhaps domain expertise isn't what's needed at all because it creates bias that isn't easily overcome.

Here's my answer: Very few domain experts participate in Kaggle competitions, as they can make far more money leveraging their expertise on the job market, or by creating their company. Winning a Kaggle contest does not mean that you have created added value. Data science without sustainable added value is not data science. 

I am an expert in online advertising, ad exchanges and fraud detection. Without some sort of real expertise developing successful solutions on real data, I would not be a data scientist. What makes me a data scientist is this experience, and of course it involves (big) data. 

Knowing linear regression, clustering algorithms, time series, R, Perl, SQL, data base architecture is not what makes me a data scientist. Besides those skills are easy to acquire - plenty of tutorials are available online. Your value and real knowledge is stuff that is not found online, for free. Otherwise, you could be replaced by a professional in Africa or Eastern Europe for a fraction of the cost, or your task could get automated and produced by a machine (I'm actually working on data science automation).

Three examples where domain expertise is critical:

  • Optimizing revenue for ad networks: Over time, business models have become more complicated than statistical models. You need to really understand the ad exchange ecosystem to figure out how and where mathematical optimization helps. And keep your domain expertise current as (1) you are helping your company win over smart competitors (coming up with unexpected new weapons all the time - you won't see the impact in your regular data set until it is too late because your database is no longer tracking the most relevant fields - such as detailed mobile data), and (2) help your company decide who to partner with and filter out bad apples based not showing up in your data
  • Fraud detection: Same thing, criminals leverage their business expertise to find a wide array of systems to make money. You might be the best data miner in the world analyzing the biggest data set, but if you have less business expertise than criminals, you will lose. For instance, criminals have designed systems to entirely bypass captcha on sign-ups. If you manage a spam detection system and create the most sophisticated captcha system in the world, you might fail: smart criminals will bypass it anyway, and real users will find it too difficult to pass the captcha test and abandon. 
  • The recent great recession has been blamed on Wall Street mathematicians developing risk models for financial assets, but being out of touch with reality such as how bubbles work.

Views: 2637

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Bob Vanderheyden on May 11, 2016 at 5:36am

Vince,

We still see professed "data scientists" who insist that they are in fact better at providing insight and value because they know nothing about the data or domain that they are analyzing, making them "unbiased" in their assessments.

After working in "Data Science" for 20+ years, with extremely rare exceptions, I've never seen a person with no or limited domain expertise uncover are real insight, that the business found valuable.  On the other, I've seen people who are very skilled, technically, make horrible mistakes in their analyses, that discredited the discipline in the eyes of business executives.

Comment by Amit Baldwa on April 15, 2015 at 8:11am

Absolutely love it... Domain Expertise, Experience, at times, is not considered important.. But they are vital....

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service