Subscribe to DSC Newsletter

This was the subject of a provocative article posted on Oracle's blog, two days ago. It certainly shows how far from the reality some big companies are. They confuse people who call themselves data scientists (or get assigned that job title), with those who are true data scientists, and might use a different job title. Many times, the issue is internal politics that create the confusion, and not recognizing a real data scientist with success stories to share, or not leveraging them.

Here is some food for thought from Vincent Granville and @JakePorway. You can add yours in the comment section below.

  • Someone who is unable to provide added value on data is NOT a data scientist, as one of the core components of data science is creating added value on all sorts of data. The bubble in question is the fake data scientist bubble. Real data scientists (like me) generate significant value. They are a dime a dozen, but rarely found in large companies. In my case, I created and manage my own company, and much of the added value is created thanks to automated data science.
  • Also, sometimes, the bureaucracy and politics prevents a true data scientist from delivering full value, if she is in a nasty environment, her team is not respected, she's not listened to, has no power, or she is dealing with executives who are totally clueless. Too many times though, we still see PhD's in their ivory tower, unable to deliver value, not understanding the business model, miscalculating the risks, no cooperating with other departments, and focused on beautiful models, rather than stuff that works. These people are not data scientists. They might produce great websites such as Yelp, not knowing that it is infested with fake reviews to the point of being useless. And unable to notice it, or to fix it.
  • Any data scientist worth their salary will say you should start with a question, NOT the data.
  • A data scientist should navigate across silos, not be confined to one silo, unless working on a very specific, narrowly-defined project. Even though, he/she should seek external data sources, as needed. Data scientists strive better in small to medium-size companies or departments, in flexible companies, as consultants, or as entrepreneurs.
  • There are two types of companies: those that see analytics as an expenditure, and those that see it as an unfair competitive advantage.
  • True data scientists are also sales people: (1) they promote techniques that are sellable (during their job interview or during corporate meetings), then (2) they sell it at a fair price (measured in numbers of hours and resources to complete the project) and (3) they finally deliver! In my case, I promise a lot of great leads to potential clients. I deliver beyond expectations, and I use back-end data science techniques to generate high volume of high quality / relevant traffic (the client does not even need to know that data science is involved). Then I help the client measure the yield, to make sure that traffic generation is correctly attributed to our (automated) efforts. I get paid after delivery, and have tons of return clients, small and big.

Related articles

Views: 20922

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Prof. Dr. Diego Kuonen on October 22, 2014 at 12:17am

The article posted on the Oracle blog is complete unrealistic and really shows "how far from the reality some big companies are".

With respect to the conclusion in the Oracle blog that "the future of data science is smarter tools, not smarter humans": I completely disagree on this point; see also the wonderful quote by Davenport and Harris at https://twitter.com/DiegoKuonen/status/524445719868878848 and Heer at https://twitter.com/DiegoKuonen/status/522481349324976128

IMHO, the key element for a successful (big) data analytics and data science future is statistical rigor and statistical thinking of humans (including hopefully also statisticians); see also my view at  and/or 

Comment by Malte Isacsson on October 6, 2014 at 12:32pm

I read the article posted on the Oracle blog, and nothing could be more wrong... It is about skills, competence, creativity (and critical thinking) regardless of the tools. I do not think the tools themselves can bring insights to people...

I have done many project, where people do not even realize that they can use the insights and also automate the analysis, i.e. draw the conclusion further to improve business. 

Some project results are so counter intuitive that I clearly doubt that "ordinary" (Sorry) people would do the analysis at all. I.e. sometimes question what is "common" knowledge.

Comment by David P. Jurist on October 6, 2014 at 11:30am

Insia Fatima, in my experience I have found nothing as "hard" as a closed mind.

Comment by Insia Fatima on October 6, 2014 at 11:27am

I can agree with much of what you're saying, but especially the "selling" point. In my mind, it is simultaneously the most important and the most difficult part of the job. And it's simply because it's not just selling through value, a lot of times, and especially in larger firms, it's hard selling. Sometimes almost down to hustling. At which point, you stop and you wonder, what the heck?

Comment by Serge Terekhov on October 2, 2014 at 9:57am

Thanks for the topic!

Value from DS work may or may not come tonight. What if your mission is to organize the general data culture in your organization (incl why and how data should be collected, stored, fused, verified, used in many ways or just kept waiting to be used in the future in some way that doesn't exist yet, etc). Data in data-oriented (almost digital) enterprise may have value of its own.

On the other hand any science (incl data science) should be the science first. It should give reproducible results subject to scientific verification (with ability of being declared as false). If someone designed a "sophisticated" trading system, that brings "value" from market for a certain time (until he timely and hapily quits), is it data science? All attributes are there (based on data, has a question first, brings value, etc.) My opinion - such particular kind of activity is not a data science. May be it can be called "empirical data engineering", or alike.

IMHO, data science should be centric around its main subject - data per se. So, data science is about scientific facts about data (not about algoritms and technologies - these should go to data mining, machine learning, statistics etc, and not about valuable results of data use - these are for specific domain science or engineering).

 

Comment by Sione Palu on October 2, 2014 at 8:14am

Yeah Vincent, I basically agree with what you're saying. There's always new techniques that emerges from the academic literature and its something that I keep an eye on for interesting new techniques. However, to me any new techniques that I use is fundamentally for the sole purpose of analysis.

Last year I did a 4 hour presentation to a team of PhDs (Computer Scientists & statisticians) on "Tensor Factorization" topic which is still relatively unknown in machine learning & statistics, but it's a topic that's familiar to Signal processing experts. Its a difficult topic but it's breaking new grounds in the last 7 years or so in its applications in a wide domains, from text-mining, recommender system, anomaly detection, signal processing, bioinformatics, and so forth. After my presentation, most of the attendees didn't understand it, so I had to do one on one discussion after the presentation.

So, new techniques do appear on a regular basis in the literature, but their use is for purpose of data analysis. 

Comment by Vincent Granville on October 2, 2014 at 7:10am

Hi Sione,

There are data scientists that are doing truly new stuff. I've been in the analytics for over 20 years (PhD in stats, 1993), and what I do now - including the methodology used, not just the data - is radically different from what I did even 5 years ago. It involves a lot of automation, new algorithms (Jackknife regression, model-free confidence intervals, hidden decision trees, brand new random number generation, feature selection, predictive power) applied to all sorts of data, usually big data sets. It actually goes far beyond processing data: data processing is the tip of the iceberg, but the big picture of what I'm doing is making real-time systems (such as traffic generatiion for this very website) work automatically and smoothly, and in a scalable way. From scratch and/or using vendor platforms - I design, build, and deploy systems that work, with home-made metrics used to test performance and find areas of improvement. Of course it's data intensive, but not just data.  A lot of this is architecture and engineering. Make various pieces of code / platforms talk smoothly together. We don't have employees, data science has automated many tasks that employees typically do (finding great articles, content generation, spam monitoring, etc.)

This data science has deep repercussions in the way we do business. In my case (and that's true for many authors/professors/researchers who have a similar business model), I change the world as follows:

  1. I own my market thanks to computational marketing efforts based on data science (involves many things, smart paid traffic, social network and other original viral techniques)
  2. I control growth and churn, keeping the right balance of good, short-term revenue with mailing list preservation and growth (involves data science optimization)
  3. Thanks to #1 and #2, I can offer my (upcoming) books almost for free, without using any publisher, and without having it available on Amazon (Amazon is full of bogus reviews). This is a threat to the traditional publishing industry, and even Amazon
  4. Likewise, I offer projects-based state-of-the-art training at no cost, a threat to traditional college education which is horribly expensive, and many times outdated. 
  5. Finally, I have my own research lab, not depending on grant money, neutral, and not managed by old professors protecting their turff at all costs (and not producing real innovation, as a side effect). This is a threat to the patent industry and the VC world, as I realease plenty of truly innovative intellectual property, for free - not in specialized journals that take years to be published and nobody reads, but on very popular and high quality blogs, including my own. I call this open intellectual property.

The threat to the "old business world" (2005 and earlier) is caused by more and more individuals doing what I described, leveraging data science to better the world - lowering the cost of many high quality, innovative products and services - and even to become wealthy themselves in the process, without any funding sources.

Vincent

Comment by Sione Palu on October 2, 2014 at 6:56am

  • "Any data scientist worth their salary will say you should start with a question, NOT the data."

I do the opposite. I start with the data, then see if there's some empirical laws to be formulated which is generalizable. I take similar position that of econo-physicists.   Don't have a theory as a prior.  Analyse the data to see if there's a pattern in there, then that empirical evidence is then use to formulate a theory that may genralize to laws that may describe the underlying mechanisms & processes of how the data was generated.  A comment by the following author summarizes it better :  

"...empirically based modelling where one asks not what we can do for the data (give it a massage), but instead asks what can we learn from the data (about how markets really work)".   His article is here  :   "Response to - Worrying Trends in Econophysics"  (http://arxiv.org/ftp/physics/papers/0606/0606002.pdf

Comment by Sione Palu on October 2, 2014 at 6:43am

IMO,  data science is a new term but old profession.  The term is sexy.  My job title is one, but when people ask what I do for a job,  I tell them that I'm a computer programmer. If they ask a bit further, then I say I specialize in mathematical (numerical) computing. I've been doing numerical computing for over 10 years with the same techniques I'm using now  (including the latest techniques/algorithms that have appeared in the literature in recent years) as a data scientist, so it doesn't matter to me if what I'm doing is called data science,  numerical computing, data analytics, blah, blah, blah (insert your favourite term here), at the end what I'm doing is analytics. I'm sure that there will be a new term that will arise in the next few years to describe what we call now data science, but the underlying tasks is the same, ie, analyse the data which can add value.

Comment by Tariq Muman on October 2, 2014 at 5:06am

What sort of professional that works with data doesn't add value on data?

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service