Subscribe to DSC Newsletter

This question was recently posted by Larry Wasserman on the Normal Deviate blog (see extract below). Larry is a statistics and machine learning professor at Carnegie Mellon University.

Here is my answer:

Data science is more than statistics: it also encompasses computer science and business concepts, and it's far more than a set of techniques and principles. I could imagine a data scientist not having a degree - this is not possible for a statistician. But the core of the issue, in my opinion, is explained below.

  • I am one of the guys who contributes to the adoption of the keyword data science. Ironically, I'm a pure statistician (Ph.D. in statistics, 1993 - computational statistics) although I changed a lot since 1993, I'm now an entrepreneur. The reason I tried hard to move away from being called statistician to being called something (anything) else, is because of the American Statistical Association: they killed the keyword statistician as well as limiting career prospects to future statisticians, by making it almost narrowly and exclusively associated with the pharmaceutical industry and small data (where most of its revenue comes from). They missed the boat - on purpose, I believe - of the new statistical revolution that came along with big data over the last 15 years.
  • Statisticians should be very familiar with computer science, big data and software: 10 billion rows with 10,000 variables should not scare a true statistician. On the cloud (or on even on my laptop as streaming data), it gets processed real fast. First step is data reduction, but even if you must keep all observations and variables, it still is feasible. And good computer scientists also produce confidence intervals - you don't need to be statistician for that, just use the First AnalyticBridge Theorem (if you are curious, check out the Second AnalyticBridge Theorem). The distinction between computer scientist and statistician is getting thinner and more fuzzy over the years. The things you did not learn at school (in statistical classes), you can still learn it online.

This diagram misses a few key concepts - including business and domain knowledge

Here's the article:

As I see newspapers and blogs filled with talk of “Data Science” and “Big Data” I find myself filled with a mixture of optimism and dread. Optimism, because it means statistics is finally a sexy field. Dread, because statistics is being left on the sidelines.

The very fact that people can talk about data science without even realizing there is a field already devoted to the analysis of data — a field called statistics — is alarming. I like what Karl Broman says:

When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math.

If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics.

Well put.

Maybe I am just pessimistic and am just imagining that statistics is getting left out. Perhaps, but I don’t think so. It’s my impression that the attention and resources are going mainly to Computer Science. Not that I have anything against CS of course, but it is a tragedy if Statistics gets left out of this data revolution.

Two questions come to mind:

1. Why do statisticians find themselves left out?

2. What can we do about it?

Read full article.

Related articles

Views: 18742

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Carey G. Butler on December 4, 2013 at 8:00pm

I think I want to be a nerd when I grow up. ;)

Good article and thanks for the links!

Comment by Pradyumna S. Upadrashta on September 27, 2013 at 9:50am

I think the key differentiator between a Data Scientist and a Statistician is in terms of accountability and commitment.  A data scientist must be accountable for the outcomes, a statistician doesn't have to be.  While a data scientist may not be a degreed person, i'd have a hard time imagining such a person being a great data scientist.  Theory informs context informs practice; without being grounded in theory, it is hard to see how a practitioner would be rolling out the best and the greatest.  The difference between a statistician and a data scientist is the difference between a consultant and a specialist in operations.  The latter tends to stick with the same problem a lot longer and see it through its different stages of evolution, from the immature conceptual stage all the way through the (N+1) iteration.  Statisticians don't commit to their models. Data Scientists do.  It is (and should be) by its very nature a much more operational role that demands more responsibility and a deeper sense of accountability.

Comment by Mark Samuel Tuttle on June 4, 2013 at 12:03pm

Dear Lynne, I agree completely.  Those who deal with data cleansing for a living develop their own disciplines and tools, but - as you observe - these seem 1) not to get turned into products, and 2) not to get written down.  Also, some paradigms - think Google - just use more data to overcome the need to cleanse data. Put differently, lots of smart people with unbounded computing resources can approach problems differently.  But, of course, that assumes unbounded data.  In healthcare, for instance, there is rarely enough, no matter what.

Comment by Lynne Mysliwiec on June 4, 2013 at 10:56am

Two questions come to mind:

1. Why do statisticians find themselves left out?
Because we're not working on the big problem -- computer scientists are.  Someone said that the best way to be successful as a researcher is to identify the big problems associated with your discipline and work on those -- small problems are uninteresting and they'll almost never get a pedagogue to the pinnacle of his or her career.  The big problem for me in statistics is data.  Data hygiene, data integrity, data collection, data manipulation, data volume. 

If 75% of any project is consumed by the process of data acquisition, cleansing, matching, denormalization, taxonomy, and other manipulation and only 25% devoted to science and analysis, then the focus SHOULD end up being on the data rather than the analysis. 

2. What can we do about it?
If we as statisticians want to be in the light of the sun and bask in the general approval of the world and our peers, then we must solve the data problem and make it possible for our colleagues to spend less than half their time in data acquisition heck and more than half their time solving business problems.   

I get 20 calls a year from tool providers telling me that they have yet another tool that will make model building faster -- but not any (or a vanishingly small number of) calls that promise easy data manipulation and handling.  That's backwards, clearly.  I have already solved the problem of efficiency of delivery once the data are clean -- I can build dozens or hundreds of models without much effort at all.  However, I still spend the bulk of my time designing the data extracts and devising how best to use that data effectively -- until that problem is solved, then I fear we will stay out in the cold.

 

Comment by Lizzy Soto Hernández on May 7, 2013 at 5:12am

I think all depends about how do you want to apply the "Data Science",you need to be swimming inside the business you are applying the techniques before to propouse a new aplications to the company or University. Depending of your knowlege about the applications you will need, is how you learn about what abilities you need  develop to reach the goals you need

Comment by Kalyanaraman K on May 3, 2013 at 3:21pm
This is a very important question, especially with statisticians. In fact Statistics started from data. Many statistics professors might have seen the photograph of Prof. R. A. Fisher sitting in a field in his research station. Also, many statistical methods came from requiring something to do with data. For example correlation coefficient came out of a requirement for a geologist, which later got refined by Karl Pearson. The most celebrated concept of Maximum Likelihood is a consequence of the practice of observing the sky to judge distances between objects. However, this came to an end when Probability had its axioms. When Prof. Fisher brought out his most celebrated paper on Mathematical Foundations Of Statistics, people in statistics started wearing the mathematics attire and data use and training started finding second place. In fact at times Staticians working with data were treated as second rated researchers. Hence, statisticians moved away from data and are talking high bro mathematics. But, still there are few with the old concepts intact. I also concur that this is a new opening for statistics. But, this has a major problem in that the whole issue is driven by business requirement rather than knowledge requirement.
Comment by Joseph Hilbe on April 25, 2013 at 10:13am

The end of statistics? Hardly. I certainly don't see the ASA being heavily influenced by the pharaceutical industry. Ive been a member well over a quarter of a century, am an ASA Fellow and chair-elect for am ASA Section, and helped start another section 20 years ago.  A statistician can be involved with huge data sets as those in astrostatistics and to a lesser extent healh outcomes analysis can attest, Of they can focus on small data where exact statistical methods are appropriate. There are those in ecology, in forestry, social science and econometrics, health outcomes abd medical statistics, geostatistics, epidemiology, and so on --- all are areas where statistical techniques can be applied. Statistics evolves. 20 years ago there were very few Bayesians, now its taking over many areas of statistics. In fact, statistics has evolved directly with computing power. The highly iterative nature of much of current statistics was barely possibe, and comparative slow, on any PC in 1995 compared to now. We can develop new methods to exploit the new technologies.

Big Data and writing efficient software routines is vital in my area - astrostatistics. The the field is in fact one with astroinformatics which deals with how best to handle truly huge amounts of data. New statistial methods are now being developed to properly analyze this type of data. But it's still statistics. If you are using mathematical models to classify and predict future data or data outside the data used in the model, you are doing statistics. Statistics as a term refers to other typs of analyses as well. But just because there is an interest in modeling huge data situatons does not mean you are not using statistical techniques, albeit perhaps new nd innovative statistical techniques.  

Comment by Mark Samuel Tuttle on April 25, 2013 at 7:05am

Vincent,

Your answers explained some things for me - that "statistics" narrowed its professional focus.  I'm working on a paper; if it comes to fruition I will follow up with you on this topic.

Thank you.

Comment by John Larimore on April 24, 2013 at 2:47pm

I am at the beginning of my career, and am definitely facing this ambiguity. I am getting my Bachelor's of Science in less than a month, and am in an absolute panic because my computing skills are limited to a single SAS Programming class and using R and Minitab in assignments. I am taking a MOOC in machine learning, got a JavaScript internship, and am generally scrapping to put together a strong enough computational skill set to compete. All of that said, if I could do it over I would have done more computing in college, but would not have changed majors. I am still glad I know the difference between probability distributions, know some of the theoretical connections between probability & statistics, etc. 

Comment by Andres Rincon on April 24, 2013 at 9:56am

Dear Vincent, I believe it is not the end of statistics, it is the beginning of somenthing larger. All the pieces that you depict on the diagram make sense to move to a much more complete point of view, however it is quite important to involve the statistician in all of them in order to be able to get a useful results.

 

If we do not realiaze about it maybe someone else will do it, but it depends on statistician to lead the other looking for a better understanding of the information.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service