Subscribe to DSC Newsletter

Many statisticians think that data science is about analyzing data, but it is more than that. Data science also involves implementing algorithms that process data automatically, to provide automated predictions and actions, such as:

  • Automated bidding systems
  • Estimating (in real time) the value of all houses in the United States (Zillow.com)
  • High-frequency trading
  • Matching a Google Ad with a user and a web page to maximize chances of conversion
  • Returning highly relevant results to any Google search
  • Book and friend recommendations on Amazon.com or Facebook
  • Tax fraud detection and detection of terrorism
  • Scoring all credit card transactions (fraud detection)
  • Computational chemistry to simulate new molecules for cancer treatment
  • Early detection of a pandemy
  • Analyzing NASA pictures to find new planets or asteroids
  • Weather forecasts
  • Automated piloting (planes and cars)
  • Client-customized pricing system (in real time) for all hotel rooms The problems cover astronomy, fraud detection, social network analytics, search engines, finance (transaction scoring), environment, drug development, trading, engineering, pricing optimization (retail), energy (smart grids), bidding and arbitrage systems.

All this involves both statistical science and terabytes of data. Most people doing this stuff do not call themselves statisticians. They call themselves data scientists.

Statisticians have been gathering data and performing linear regressions for several centuries. DAD (discover / access / distill) performed by statisticians 300 years ago, 20 years ago, today, or in 2015 for that matter, has little to do with DAD performed by data scientists today. The key message here is that eventually, as more statisticians pick up on these new skills and more data scientists pick up on statistical science (sampling, experimental design, confidence intervals - not just the ones described in chapter 5 in our book), the frontier between data scientists and statisticians will blur. Indeed, I can see a new category of data scientists emerging: data scientists with strong statistical knowledge, just we already have a category of data scientists with significant engineering experience (Hadoop).

Also, what makes data scientists different from computer scientists is that they have a much stronger statistics background, especially in computational statistics, but sometimes also in experimental design, sampling, and Monte Carlo simulations.

Related articles

Views: 9697

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Dragos Bandur on January 9, 2017 at 7:13pm

Hello,

Still enjoy articles which justify statistics- since it has become such a small part in a more marketable environment named data science. Every time I read about the "distance" between statistics and data science I tend to agree with the authors until I begin wondering how relevant the outcomes would be without checking their statistical significance with respect to a "null" state, or how relevantly different the predictions will be if data had more quantity and less diversity, or how sound our modeling approach would be if we forgot that the data we have merely represents an instantiation of stochastic process ...  " Data Science is more than analyzing data"!. Of course it is because data needs to be made analyzable before we start analyzing, and this truth is as old as assumptions in theorems. 

Comment by Weissmann Luba on January 17, 2014 at 12:14pm
Hi,
I just have to disagree with you - as a statistician, I needed to perform at least 5 out of 14 points you mentioned above. I really think it semantics, and not real differences. "Data scientist" is just a new name for the old science.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service