Many statisticians think that data science is about analyzing data, but it is more than that. Data science also involves implementing algorithms that process data automatically, to provide automated predictions and actions, such as:
All this involves both statistical science and terabytes of data. Most people doing this stuff do not call themselves statisticians. They call themselves data scientists.
Statisticians have been gathering data and performing linear regressions for several centuries. DAD (discover / access / distill) performed by statisticians 300 years ago, 20 years ago, today, or in 2015 for that matter, has little to do with DAD performed by data scientists today. The key message here is that eventually, as more statisticians pick up on these new skills and more data scientists pick up on statistical science (sampling, experimental design, confidence intervals - not just the ones described in chapter 5 in our book), the frontier between data scientists and statisticians will blur. Indeed, I can see a new category of data scientists emerging: data scientists with strong statistical knowledge, just we already have a category of data scientists with significant engineering experience (Hadoop).
Also, what makes data scientists different from computer scientists is that they have a much stronger statistics background, especially in computational statistics, but sometimes also in experimental design, sampling, and Monte Carlo simulations.
Related articles
Comment
Hello,
Still enjoy articles which justify statistics- since it has become such a small part in a more marketable environment named data science. Every time I read about the "distance" between statistics and data science I tend to agree with the authors until I begin wondering how relevant the outcomes would be without checking their statistical significance with respect to a "null" state, or how relevantly different the predictions will be if data had more quantity and less diversity, or how sound our modeling approach would be if we forgot that the data we have merely represents an instantiation of stochastic process ... " Data Science is more than analyzing data"!. Of course it is because data needs to be made analyzable before we start analyzing, and this truth is as old as assumptions in theorems.
© 2017 Data Science Central Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
You need to be a member of Data Science Central to add comments!
Join Data Science Central