This question was recently posted by Larry Wasserman on the Normal Deviate blog (see extract below). Larry is a statistics and machine learning professor at Carnegie Mellon University.

Here is my answer:

Data science is more than statistics: it also encompasses computer science and business concepts, and it's far more than a set of techniques and principles. I could imagine a data scientist not having a degree - this is not possible for a statistician. But the core of the issue, in my opinion, is explained below.

  • I am one of the guys who contributes to the adoption of the keyword data science. Ironically, I'm a pure statistician (Ph.D. in statistics, 1993 - computational statistics) although I changed a lot since 1993, I'm now an entrepreneur. The reason I tried hard to move away from being called statistician to being called something (anything) else, is because of the American Statistical Association: they killed the keyword statistician as well as limiting career prospects to future statisticians, by making it almost narrowly and exclusively associated with the pharmaceutical industry and small data (where most of its revenue comes from). They missed the boat - on purpose, I believe - of the new statistical revolution that came along with big data over the last 15 years.
  • Statisticians should be very familiar with computer science, big data and software: 10 billion rows with 10,000 variables should not scare a true statistician. On the cloud (or on even on my laptop as streaming data), it gets processed real fast. First step is data reduction, but even if you must keep all observations and variables, it still is feasible. And good computer scientists also produce confidence intervals - you don't need to be statistician for that, just use the First AnalyticBridge Theorem (if you are curious, check out the Second AnalyticBridge Theorem). The distinction between computer scientist and statistician is getting thinner and more fuzzy over the years. The things you did not learn at school (in statistical classes), you can still learn it online.

This diagram misses a few key concepts - including business and domain knowledge

Here's the article:

As I see newspapers and blogs filled with talk of “Data Science” and “Big Data” I find myself filled with a mixture of optimism and dread. Optimism, because it means statistics is finally a sexy field. Dread, because statistics is being left on the sidelines.

The very fact that people can talk about data science without even realizing there is a field already devoted to the analysis of data — a field called statistics — is alarming. I like what Karl Broman says:

When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math.

If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics.

Well put.

Maybe I am just pessimistic and am just imagining that statistics is getting left out. Perhaps, but I don’t think so. It’s my impression that the attention and resources are going mainly to Computer Science. Not that I have anything against CS of course, but it is a tragedy if Statistics gets left out of this data revolution.

Two questions come to mind:

1. Why do statisticians find themselves left out?

2. What can we do about it?

Read full article.

Related articles

Views: 28075


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Michel Baudin on August 8, 2019 at 4:23am

When I look at books about statistics, they seem to start with a  dataset and end with an inference. They cover neither data acquisition, cleaning and munging nor the communication of results to non-statisticians. I see data science as encompassing this entire workflow,  and I think that’s why university statistics departments are changing their names to “data science.”

This being said, taken literally, “data science” is too broad a term, synonymous with computer science. Originally, “statistics” was  the collection of data about the state, which was too narrow. Finding the right words is hard.

Comment by Maryam Zolghadr on September 30, 2017 at 5:29pm

And as long as statisticians think that data science is statistics and what computer scientists, biologists, archaeologists, ..., do with data is just statistics, we are not going anywhere. What I see is while a statistician is waiting for results of a paper based survey to come back and form his dataset, a computer scientist simply writes scripts to scrape tones of data, does some analyses and publishes the results. 

Comment by Maryam Zolghadr on September 30, 2017 at 5:01pm

Big like to this article. I studied statistics and graduated recently. I actually believe that statisticians totally lost the boat. Statisticians had smth valuable but we lost it to computer scientists and physicists easily. All data scientists I know around me are either computer scientists or physicists. I took a computational course at physics department and it is interesting to see that how an old professor updated himself with new programming languages and technologies. He uses github, python, programming forums, .... However in my (statistics) department, it is like people are static. They remained at the same place they just graduated 20 years ago. If they used Matlab with their PhD theses, they just know Matlab, and barely R. They never touched python, C++ or Java.  Even those study computational biology are better programmers than statisticians. I think it is easy to see how much a graduate student in statistics does worth today. You can line up a graduate student in computer science and a graduate student in statistics for a job interview in data science/big data and see which one would get the position. 

Comment by Carey G. Butler on December 4, 2013 at 8:00pm

I think I want to be a nerd when I grow up. ;)

Good article and thanks for the links!

Comment by Mark Samuel Tuttle on June 4, 2013 at 12:03pm

Dear Lynne, I agree completely.  Those who deal with data cleansing for a living develop their own disciplines and tools, but - as you observe - these seem 1) not to get turned into products, and 2) not to get written down.  Also, some paradigms - think Google - just use more data to overcome the need to cleanse data. Put differently, lots of smart people with unbounded computing resources can approach problems differently.  But, of course, that assumes unbounded data.  In healthcare, for instance, there is rarely enough, no matter what.

Comment by Lynne Mysliwiec on June 4, 2013 at 10:56am

Two questions come to mind:

1. Why do statisticians find themselves left out?
Because we're not working on the big problem -- computer scientists are.  Someone said that the best way to be successful as a researcher is to identify the big problems associated with your discipline and work on those -- small problems are uninteresting and they'll almost never get a pedagogue to the pinnacle of his or her career.  The big problem for me in statistics is data.  Data hygiene, data integrity, data collection, data manipulation, data volume. 

If 75% of any project is consumed by the process of data acquisition, cleansing, matching, denormalization, taxonomy, and other manipulation and only 25% devoted to science and analysis, then the focus SHOULD end up being on the data rather than the analysis. 

2. What can we do about it?
If we as statisticians want to be in the light of the sun and bask in the general approval of the world and our peers, then we must solve the data problem and make it possible for our colleagues to spend less than half their time in data acquisition heck and more than half their time solving business problems.   

I get 20 calls a year from tool providers telling me that they have yet another tool that will make model building faster -- but not any (or a vanishingly small number of) calls that promise easy data manipulation and handling.  That's backwards, clearly.  I have already solved the problem of efficiency of delivery once the data are clean -- I can build dozens or hundreds of models without much effort at all.  However, I still spend the bulk of my time designing the data extracts and devising how best to use that data effectively -- until that problem is solved, then I fear we will stay out in the cold.


Comment by Lizzy Soto Hernández on May 7, 2013 at 5:12am

I think all depends about how do you want to apply the "Data Science",you need to be swimming inside the business you are applying the techniques before to propouse a new aplications to the company or University. Depending of your knowlege about the applications you will need, is how you learn about what abilities you need  develop to reach the goals you need

Comment by Kalyanaraman K on May 3, 2013 at 3:21pm
This is a very important question, especially with statisticians. In fact Statistics started from data. Many statistics professors might have seen the photograph of Prof. R. A. Fisher sitting in a field in his research station. Also, many statistical methods came from requiring something to do with data. For example correlation coefficient came out of a requirement for a geologist, which later got refined by Karl Pearson. The most celebrated concept of Maximum Likelihood is a consequence of the practice of observing the sky to judge distances between objects. However, this came to an end when Probability had its axioms. When Prof. Fisher brought out his most celebrated paper on Mathematical Foundations Of Statistics, people in statistics started wearing the mathematics attire and data use and training started finding second place. In fact at times Staticians working with data were treated as second rated researchers. Hence, statisticians moved away from data and are talking high bro mathematics. But, still there are few with the old concepts intact. I also concur that this is a new opening for statistics. But, this has a major problem in that the whole issue is driven by business requirement rather than knowledge requirement.
Comment by Joseph Hilbe on April 25, 2013 at 10:13am

The end of statistics? Hardly. I certainly don't see the ASA being heavily influenced by the pharaceutical industry. Ive been a member well over a quarter of a century, am an ASA Fellow and chair-elect for am ASA Section, and helped start another section 20 years ago.  A statistician can be involved with huge data sets as those in astrostatistics and to a lesser extent healh outcomes analysis can attest, Of they can focus on small data where exact statistical methods are appropriate. There are those in ecology, in forestry, social science and econometrics, health outcomes abd medical statistics, geostatistics, epidemiology, and so on --- all are areas where statistical techniques can be applied. Statistics evolves. 20 years ago there were very few Bayesians, now its taking over many areas of statistics. In fact, statistics has evolved directly with computing power. The highly iterative nature of much of current statistics was barely possibe, and comparative slow, on any PC in 1995 compared to now. We can develop new methods to exploit the new technologies.

Big Data and writing efficient software routines is vital in my area - astrostatistics. The the field is in fact one with astroinformatics which deals with how best to handle truly huge amounts of data. New statistial methods are now being developed to properly analyze this type of data. But it's still statistics. If you are using mathematical models to classify and predict future data or data outside the data used in the model, you are doing statistics. Statistics as a term refers to other typs of analyses as well. But just because there is an interest in modeling huge data situatons does not mean you are not using statistical techniques, albeit perhaps new nd innovative statistical techniques.  

Comment by Mark Samuel Tuttle on April 25, 2013 at 7:05am


Your answers explained some things for me - that "statistics" narrowed its professional focus.  I'm working on a paper; if it comes to fruition I will follow up with you on this topic.

Thank you.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service