Subscribe to Dr. Granville's Weekly Digest

I think one of the issues is that academic statisticians, who publish theoretical articles not based on data analysis, are... not statisticians anymore. Also many statisticians think that data science is about analyzing data, but it is more than that. It also involves implementing algorithms that process data automatically, to provide automated predictions and actions, e.g.

  • automated bidding systems
  • estimating (in real time) the value of all houses in US (Zillow.com)
  • high frequency trading
  • matching a Google Ad with a user and a web page to maximize chances of conversion
  • returning highly relevant results to any Google search
  • book and friend recommendations on Amazon or Facebook
  • tax fraud detection, detection of terrorism
  • scoring all credit card transactions (fraud detection)
  • computational chemistry to simulate new molecules for cancer treatment
  • early detection of an epidemy
  • analyzing NASA pictures to find new planets or asteroids
  • weather forecasts
  • automated piloting (planes, cars)
  • client-customized pricing system (in real time) for all hotel rooms 

All this involves both statistical science and terabytes of data. People doing this stuff do not call themselves statisticians, in general. They call themselves data scientists.

Related articles

Views: 9123

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Jaehyeon Kim on October 12, 2013 at 7:59pm

I guess it may be one of the reasons why, in the industry, it is often said, 'There are abundance of data but lack of knowledge.'.

The following statement may contradict the above argument.

*Sir Ronald Aylmer Fisher, who popularized the maximum likelihood method, can be considered as a (theoretical) statistician even if his theorems were not based on data analysis.

Moreover exaggeration may be made for contradiction.

*Albert Einstein and Peter W. Higgs may not be considered as physicists because neither the general theory of relativity nor the Higgs boson were observed at that time when they published their papers (?).

Comment by James C. Nnannah on October 10, 2013 at 4:36am

@Amy Talking about those academic statisticians who publish theoretical articles not based on data analysis, I really then wonder what they could be called. In every field of knowledge, there is a always the 'research' part and the 'practice' part. 


In scientific knowledge, a research scientist basically seeks to extend the frontiers and break new grounds in specific knowledge domains. I think therefore that a statistician who concerns himself with 'researching' new and more efficient algorithms, formulae or models FORMING THE BASIS for solving known problems (like Vincent's 'dislike' for eigenvalues or his discovery of the "curse of big data") and publishing same should not lose his 'statistician' title but the title rather be qualified as 'research statistician'. Just thinking...

Comment by Zahir Balaporia on October 4, 2013 at 3:19pm

The labels being used (Data Scientist vs Statistician) do not help get to root cause. This is an age old problem about theory vs practice.  A statistician (or mathematician or physicist or ...) who is good with theory and not with practice, is just that; a good theoretician. And a statistician (or mathematician or physicist or ...) who is good at solving problems in practice, is just that; a good practitioner.  We need both. 

I stole the following quotes from a paper by John Price in the Journal of Strategic Leadership on the....  I think they are applicable to this conversation.

“He who loves practice without theory is like the sailor who boards a ship without a rudder
and compass and never knows where he may cast.” - Leonardo da Vinci
“Experience without theory is blind, but theory without experience is mere intellectual play.”
-Immanuel Kant

So, I recommend focusing on what makes for a good practitioner as attributes of good practice, instead of creating this statistician versus data scientist dichotomy. 

Comment by Fari Payandeh on October 1, 2013 at 9:11pm

Amy,

Great observation!

I can tell you from personal experience that being a data scientist  is not an easy job. The company I worked for   hired a PHD in Physics with 4 years of statistical modeling background. It didn't work out. My recommendation to anyone who wants to become a Data Scientist is to first and foremost learn about real-world data. You could be the greatest mathematician in the world, but not having hands-on experience with data will be an impediment. Learn a programing language like Python and run it against a database. I highly recommend the book "Visualize This". Don't let the title discourage you. It walks you through the necessary steps to create simple graphs using the data available on websites. It's a good place start for someone with strong statistics background and no Database/Programming background.

Comment by Vincent Granville on September 29, 2013 at 8:37pm

By having no data science training, I mean I never attended a course called 'data science'. But just like the first Professor who ever taught the first MBA class in the first MBA program didn't himself have any MBA certificate or degree, he/she was nevertheless qualified.

When I mention becoming a data scientist in 6 months, I meant an accelerated program for professional who already have some experience and knowledge, possibly in computer science or engineering. Could you really become data scientist from scratch in just 6 months? Maybe, I don't know, I could try to train my 12 years old daughter to become a data scientist, and see if it's feasible. It might not be impossible - I've trained business analysts and interns to run web crawlers and SQL queries from within Perl scripts (including scratch course about UNIX) in just one hour.

Comment by Pradyumna S. Upadrashta on September 29, 2013 at 1:09pm

@Randy: Yep! I would imagine this discussion can later be text mined to extract potentially useful predictors (degree, passion, experience, language use, personality, prior results) or some dimensionally reduced versions of these [dare I say, some 'eigenvectors'?] that form some kind of model of what makes someone a good Data Scientist.

On the theory stats note:    Let me give you an example.  There are people out there who think they are applied statisticians, who don't know the difference between a sample and a population.  They think they have a population, when in fact they are dealing with a really really large sample.  This means, they don't know that there are unrealized outcomes that are not represented in the specific data they are looking at (the observed) - their world-view is that of the proverbial big frog in the little pond.  They will never know what they don't know, and they don't know what they do know, or even what they are looking at.  Their analytic results for a specific question may be correct because they use some out-of-the-box pre-packaged statistical method, but their ability to expand their thinking about the situation will be as limited as the frog in the pond, so they can't really innovate and improve upon the situation.

Comment by Randy Bartlett on September 29, 2013 at 12:48pm

I think this is a valuable conversation to have.  I agree that we need more than just publication-oriented statistics training.  I would say that the statistics degree has always needed a boost.  It used to be that after earning a degree in statistics you might be 70-80% prepared.  After graduating, you might need a couple of years to finish learning statistics while you read the MBA, pharmacology, or whatever books.  Now it is more like 60-70% prepared; the pace has quickened.  However, I think this gap exists for all degree problems at all universities (despite marketing claims to the contrary).  Even so, if I wanted to be a Data Scientist, then a statistics degree mixed with something else (OR, IE, econometrics, MBA, comp sci, et al.) is potent.  To be a Data Scientist you have to be capable of thinking like a statistician. 

As I posted elsewhere, I think the statistics departments and ASA are starting to feel the heat.  There were some articles calling for change in Amstat News and there will be another next month (October, 2013). 

Comment by Pradyumna S. Upadrashta on September 29, 2013 at 12:15pm

@ Mark: I think he was referring to my comment earlier.  

However, I strongly disagree with his self-assessment as someone with "0 training in data science".  Vincent is clearly what I would consider a thought leader, so i'm not sure why he would classify himself as someone with "0 training".  How can you be seasoned and still have 0 training? I consider that whatever makes one seasoned is precisely the training we're talking about.  On the other hand, I have seen people interview who have advanced level SAS certifications, with absolutely no grounding in theory/practice of statistics analysis, who I would say are purely entry-level.  Statistics is hard, its deep, it takes time to learn, you have to fail many times before you really intuitively understand and internalize it, ...so a 6 month class in data science, wouldn't cut it for me.  Can someone who speaks french their entire life take a 6 month course in french grammar and be considered a senior? Sure.  Can someone who has no clue about french take the same class and be considered a senior, obviously not.  So perhaps we should qualify your statement that this individual with 6 months training doesn't already come from a solid technical background in stats/maths?  

As I said, I would care more about the fluidity of your ability to apply abstract concepts to the situation at hand, than I do about the degree, or how you came to be that way.  Having a degree and some prior chops just means I don't have to probe too hard to know that you know your stuff -- your committee has already established your chops as a Scientist.  In a naive Bayesian sense, i'd start with the assumption that I am less likely to be wrong given that kind of prior, and all I have to do is to determine whether or not you're also a Data Artist.  It is just as much about passion, as it is about knowledge.  It's common experience that it is hard(er) (and demotivating) to work with people who just don't care.

Of course, not everyone with an advanced degree is cut out for the realities of applying what they know in a time-sensitive, priority-driven environment where compromises and committee group-think come into play.  So again, its the ability to transition from the abstract to the mundane that really matters, as well as your ability to interact with other people constructively/productively.

I think what makes someone a clear senior is that they "know what they know" and "know what they don't know" and aren't afraid to be in the second category, but also not afraid to be in the first category.  A senior should speak their mind because its the right thing to do -- because other people are depending on your thoughtful input to make the outcome better.  A junior always hesitates to speak up and to express their opinion, especially when it matters.  A "yes (wo)man" can never be a senior, in my opinion; though all too often that's what happens in practice.  You can't be anything but a yes person if you don't really have some unique perspective that challenges the status quo.  In mature industries, this is perhaps less important; in a startup environment, this is detrimental.

Comment by Mark L. Stone on September 29, 2013 at 11:28am

@Vincent Granville .  Please read what I wrote.  I never said you needed to have training in data science in order to be a senior, seasoned data scientist I was questioning calling someone with a 6 month data science training program, but without strong prior theoretical grounding, a data scientist, as scientist to me is a misnomer in such a situation (analogous to calling a garbageman (on a truck) a sanitary engineer).  You may not want to admit the value of your theoretical training, but that does not mean it does not at least indirectly guide some of what you do as a data scientist.

Comment by Vincent Granville on September 29, 2013 at 11:23am

@Mark: You can have no training in data science and still be a senior, seasoned data scientist. Just like you can successfully run a large business without having an MBA degree nor any kind of business degree. Actually, there are plenty of such people, and I am one of them.

Follow Us

Videos

  • Add Videos
  • View All

© 2014   Data Science Central

Badges  |  Report an Issue  |  Terms of Service