I recently read a Harvard Business Review (HBR) article “You need an algorithm, not a data scientist”. The author (from an analytics vendor) argues that:
Other articles present similar arguments  . These arguments are off the mark for several reasons. I want to present an alternative perspective. What you actually need is a data scientist and then an algorithm.
First of all, I want to address each of the arguments made in the HBR post.
The HBR post’s argument assumes that algorithms and data scientist are mutually exclusive rather than complementary. Actually, both work together in a lab/factory analogy. With a good organisational structure, ideas are ‘experimented’ with using Data ‘Science’ in the ‘lab’. When proven measurably useful, then investment is made in productionising the associated Data Science in the ‘factory’. The Data Science will have yielded everything from insights about data quality and data profiles through to the most appropriate visualizations and algorithms for the business problem. In addition, good Data Science should have provided expected performance metrics for the factory product. This helps build a business case for the product and ground expectations.
The HBR post’s argument assumes that organisations are trying to scale with data scientists rather than productionised data products. This is patently not the case and any organisations taking this approach need to reconsider their Data Science strategy. The point of Data Science is to be a service. This service can quickly do agile experiments to quantify and investigate business hypotheses about data and help inform the roll out of products. Doing Data Science therefore informs the investment decision in software development, software purchase, software tuning, etc.
The HBR post claims that certain patterns cannot be perceived by humans doing manual analysis but are detectable by an algorithm. This is partially true. Algorithms can certainly work day and night, quickly processing refreshed and streaming data better than any human could ever hope to. However, if the system being analysed is not well understood then appropriate analyses cannot be chosen and tuned before ‘switching on the fire hose’. It is this understanding, modelling, analysing and tuning that is the job of the Data Scientist in collaboration with the domain expert. The Data Scientist does this in part using statistical and machine learning algorithms.
The HBR post claims that modern tools require limited intervention, tuning, and integration. This claim needs to be taken with a pinch of salt. It is common knowledge that the vast majority of time on a data project is spent understanding and cleaning the data. We should be very sceptical of claims that software can simply be ‘turned on’ without the necessary understanding of the data and the problem domain. There are just too many variations in data to make such a confident claim about the capabilities of data software.
Data Scientists and automation (data products, algorithms, production code, whatever) are complementary functions. Good Data Science supports automation. It quickly adds value by investigating, testing, and quantifying hypotheses about existing data and potential new data.
Simply switching on software ignores the reality of working with data, regardless of the claims of that software. Data is full of nuances, errors and unknown relationships that are best discovered and tested by an expert Data Scientist. This takes time and does not scale but it does not have to scale. It is the necessary prudent investment that you make before spending months in product development and automation of the wrong algorithm on the wrong or broken data.
Data Science done well tells you:
You can read more about how to do agile Data Science that is transferable to the ‘production factory’ in my book Guerrilla Analytics: A Practical Approach to Working with Data and get the latest news at here.