Subscribe to DSC Newsletter

Many statisticians think that data science is about analyzing data, but it is more than that. Data science also involves implementing algorithms that process data automatically, to provide automated predictions and actions, such as:

  • Automated bidding systems
  • Estimating (in real time) the value of all houses in the United States (
  • High-frequency trading
  • Matching a Google Ad with a user and a web page to maximize chances of conversion
  • Returning highly relevant results to any Google search
  • Book and friend recommendations on or Facebook
  • Tax fraud detection and detection of terrorism
  • Scoring all credit card transactions (fraud detection)
  • Computational chemistry to simulate new molecules for cancer treatment
  • Early detection of a pandemy
  • Analyzing NASA pictures to find new planets or asteroids
  • Weather forecasts
  • Automated piloting (planes and cars)
  • Client-customized pricing system (in real time) for all hotel rooms The problems cover astronomy, fraud detection, social network analytics, search engines, finance (transaction scoring), environment, drug development, trading, engineering, pricing optimization (retail), energy (smart grids), bidding and arbitrage systems.

All this involves both statistical science and terabytes of data. Most people doing this stuff do not call themselves statisticians. They call themselves data scientists.

Statisticians have been gathering data and performing linear regressions for several centuries. DAD (discover / access / distill) performed by statisticians 300 years ago, 20 years ago, today, or in 2015 for that matter, has little to do with DAD performed by data scientists today. The key message here is that eventually, as more statisticians pick up on these new skills and more data scientists pick up on statistical science (sampling, experimental design, confidence intervals - not just the ones described in chapter 5 in our book), the frontier between data scientists and statisticians will blur. Indeed, I can see a new category of data scientists emerging: data scientists with strong statistical knowledge, just we already have a category of data scientists with significant engineering experience (Hadoop).

Also, what makes data scientists different from computer scientists is that they have a much stronger statistics background, especially in computational statistics, but sometimes also in experimental design, sampling, and Monte Carlo simulations.

Related articles

Views: 221


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2015   Data Science Central   Powered by

Badges  |  Report an Issue  |  Terms of Service