Subscribe to DSC Newsletter

"You need an algorithm, not a Data Scientist". Um...not quite

I recently read a Harvard Business Review (HBR) article “You need an algorithm, not a data scientist”. The author (from an analytics vendor) argues that:

  1. Companies are increasingly trying to do more analysis of their data to find value and are hiring people (data scientists) to do this work. This people-centric approach does not scale.
  2. Some patterns are too imperceptible to be captured by humans. The author gives the example of monitoring a slowly changing customer profile which would go unnoticed with a manual examination of the data. However algorithms can continuously monitor this data at scale and so are better.
  3. Modern tools “require very little or no human intervention, zero integration time, and almost no need for service to re-tune the predictive model as dynamics change”.
  4. Therefore, the companies that succeed will be those that use automated algorithms – which are faster, more accurate, more scalable, and more adaptive than manually analyzed data

Other articles present similar arguments [2] [3]. These arguments are off the mark for several reasons. I want to present an alternative perspective. What you actually need is a data scientist and then an algorithm.

Why you need a data scientist and then an algorithm

First of all, I want to address each of the arguments made in the HBR post.

The HBR post’s argument assumes that algorithms and data scientist are mutually exclusive rather than complementary. Actually, both work together in a lab/factory analogy. With a good organisational structure, ideas are ‘experimented’ with using Data ‘Science’ in the ‘lab’. When proven measurably useful, then investment is made in productionising the associated Data Science in the ‘factory’. The Data Science will have yielded everything from insights about data quality and data profiles through to the most appropriate visualizations and algorithms for the business problem. In addition, good Data Science should have provided expected performance metrics for the factory product. This helps build a business case for the product and ground expectations.

The HBR post’s argument assumes that organisations are trying to scale with data scientists rather than productionised data products. This is patently not the case and any organisations taking this approach need to reconsider their Data Science strategy. The point of Data Science is to be a service.  This service can quickly do agile experiments to quantify and investigate business hypotheses about data and help inform the roll out of products. Doing Data Science therefore informs the investment decision in software development, software purchase, software tuning, etc.

The HBR post claims that certain patterns cannot be perceived by humans doing manual analysis but are detectable by an algorithm. This is partially true. Algorithms can certainly work day and night, quickly processing refreshed and streaming data better than any human could ever hope to. However, if the system being analysed is not well understood then appropriate analyses cannot be chosen and tuned before ‘switching on the fire hose’. It is this understanding, modelling, analysing and tuning that is the job of the Data Scientist in collaboration with the domain expert. The Data Scientist does this in part using statistical and machine learning algorithms.

The HBR post claims that modern tools require limited intervention, tuning, and integration. This claim needs to be taken with a pinch of salt. It is common knowledge that the vast majority of time on a data project is spent understanding and cleaning the data. We should be very sceptical of claims that software can simply be ‘turned on’ without the necessary understanding of the data and the problem domain. There are just too many variations in data to make such a confident claim about the capabilities of data software.

Data Science and Algorithms are Complementary

Data Scientists and automation (data products, algorithms, production code, whatever) are complementary functions. Good Data Science supports automation. It quickly adds value by investigating, testing, and quantifying hypotheses about existing data and potential new data.

Simply switching on software ignores the reality of working with data, regardless of the claims of that software. Data is full of nuances, errors and unknown relationships that are best discovered and tested by an expert Data Scientist. This takes time and does not scale but it does not have to scale. It is the necessary prudent investment that you make before spending months in product development and automation of the wrong algorithm on the wrong or broken data.

Data Science done well tells you:

  • what you didn’t already know about the data
  • what an appropriate algorithm should be, given what you now know about the data
  • what the measurable expectations of that algorithm should be when it is automated in production

You can read more about how to do agile Data Science that is transferable to the ‘production factory’ in my book Guerrilla Analytics: A Practical Approach to Working with Data and get the latest news at here.

Views: 4601

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Sean McClure on November 7, 2015 at 5:49pm
Just left my comment on/to HBR about this article. Too long to post here unfortunately.  Link below. I encourage others to post their comments on the site and help educate people on what it is Data Scientists do (and don't do) so the hyped up vendors don't have the loudest voice, and the perhaps naive magazines can better understand. ROI in Data Science comes from this proper understanding. HBR didn't help by posting this piece. Thanks for your great write up. 
Comment by Sione Palu on July 27, 2015 at 9:29am

You need both, otherwise its cheaper to hire a psychic to predict future trends for business problems  which is cheaper. Historically, psychic Rasputin was claimed to do exactly that for the Russian Csar over a century ago, with no data science involved at all, lo!!!

Comment by Max Galka on July 25, 2015 at 9:44am

I am hesitant to comment about something I have not read firsthand, but if the claim is that companies should automate the work of data scientists, that is the most off-base (and dangerous) thing I have ever heard from the HBR.

I would argue strongly the exact opposite point. Business managers should be putting more emphasis on the human side of their data analysis. There are way too many intangible problems that can arise that only someone who understands statistics and data can identify or would even know to look for.

Way too many companies are jumping aboard the big data bus without understanding the pitfalls. That is why so much of the data analysis you see online is just plain wrong.

Comment by Pradyumna S. Upadrashta on July 24, 2015 at 11:30am

  1. Modern tools “require very little or no human intervention, zero integration time, and almost no need for service to re-tune the predictive model as dynamics change”

Hahahaha. Jokes. I love it.

Comment by LocalPoint Tech on July 24, 2015 at 2:22am

I think the Harvard article misunderstands the purpose of scalability: Scalability comes with people who are flexible enough to grow with new analytic methods. Humans can do more than one thing. :-)

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service