Do data scientists need to be domain experts to deliver good analytics?

During my 30 years of analytics career, prospective employers and clients have often asked me: "How can you help us with data-driven insights when you have not worked in this industry before?". I argue for greater emphasis on machine learning skills in the data scientist and their partnership with domain experts as an effective pathway to bring data science to a business.

Clearly, the description of data scientist as the mythical unicorn who has computer science skills, statistical knowledge and domain expertise (Figure 1) has had an impact. The proliferation of different analytics disciplines such as social network analysis, digital analytics, bio-informatics and supply chain analytics, lends weight to the argument that domain expertise definitely matters.

figure1

Source: Drew Conways Data Science Venn Diagram

There are also anecdotes on the web of data science projects that went pear shaped because the analysts were not subject matter experts. A deeper look into these anecdotes reveals that the issues are not due to a lack of domain expertise, but due to poor data science such as over-fitting of data, bad sampling methods and unnecessary data cleansing. Still the myth that domain expertise trumps all else continues!!

Data mining competitions such as Kaggle and KDD have demonstrated the opposite and shown how data science can be successfully outsourced to people without domain expertise. Many companies have run competitions on such diverse topics as optimizing flight routes, predicting ocean health and diabetic retinopathy detection. Data scientists with little or no expertise in the domain have responded brilliantly with useful solutions. Adam Kowalczyk and I won the KDD Cup on yeast gene regulation prediction with no background in biology. Some data scientists, such as David Vogel and Claudia Perlisch, have even won across multiple domains, indicating that data science skills are transferable across domains.

The counter argument to Kaggle’s success is that in these competitions, the domain experts have already generated the hypothesis by posing the right business question and preparing the data (Figure 2), and the competitors need only model and test. But, in the brave new world of massive data along with the mathematical tools and computing power to crunch these numbers, old world paradigm of hypothesizing before modeling is likely to be challenged. Google has shown a whole new way of understanding the world without any a priorimodels or theories with their approach to language learning.

figure2

Source: Dr. Bhavani Raskutti, Data Mining Lead, Pacific Brands, “Data Mining in Industry: Putting Theory into Practice”, guest lecture Royal Melbourne Institute of Technology, 2011.

So, if domain expertise is not necessary for the steps of posing the business question and analytical problem definition, what about data acquisition and data preparation?

In my experience, domain knowledge about data capture and transformation processes at the sensors can be acquired through exploration of the raw data. Often, good data scientists become subject experts just by playing with the data and asking questions to domain experts about the data anomalies. For instance, using just such a process, my analytics team in a manufacturing company identified a long standing, but previously undiscovered anomaly in the summarised sales and inventory feed from a large retailer. This anomaly materially affected the retail inventory reporting and had to be fixed programmatically. Subsequently, my data science team members were the acknowledged retail supply chain experts!!

Domain expertise is most relevant, perhaps, in the interpretation of insights, particularly those insights gained using unsupervised learning about the workings of complex physical processes. An example of just such a situation was the use of Aster discovery platform to perform root cause analysis of failures in a multiple aircraft fleet from aircraft sensor and maintenance data. While the analysis started with no a priori model, a post prioriinterpretation of the results from the path analysis and the subsequent follow-up to improve aircraft safety certainly required domain expertise.

Returning back to the original question: ‘How can you help us with data-driven insights when you have not worked in this industry before? ‘, my response is as follows.

  1. Machine learning (the intersection of computer science and statistics in Figure 1) brings a fresh perspective that leads to new insights and no prior domain knowledge can potentially be advantageous, especially in overcoming long standing domain bias.
  2. Provided the machine learners have curiosity and willingness to learn about the company and domain along with the humility to ask the domain experts about the subject, they will not only understand the domain, but through their questioning they will cross-pollinate the subject matter experts so the team as a whole is stronger.

So, when hiring a data scientist, focus on the machine learning aspect, particularly, the desire to play with the data using a number of different techniques and languages. Consider also the analytical skills to question and solve problems iteratively. Partner the data scientists with domain experts so cross-pollination can occur. This, to me, is a better pathway for bringing data science to a business than searching for the elusive unicorn depicted in Figure 1.

Bhavani Raskutti is the Domain Lead for Advanced Analytics Teradata ANZ . She is responsible for identifying and developing analytics opportunities using Teradata Aster and Teradata’s analytics partner solutions. She is internationally recognised as a data mining thought leader and is regularly invited to present at international conferences on Mining Big Data. She is passionate about transforming businesses to make better decisions using their data capital.

This article was originally posted on my Teradata Blog.

Load Previous Comments
  • Vincent Granville

    It certainly doesn't hurt to have domain expertise. In my practice, being a domain expert not only reduces costs (no need to hire someone else) but also efficiency, as I can see how to make all the pieces (business side, product development, technical)  fit together. The smaller the company, the more you need domain expertise to succeed.

  • Bhavani Raskutti

    Agree that domain expertise (knowledge in general in any field) never hurts. But what I am arguing in the article is that domain expertise in the data scientist is not the most critical component in delivering good insights from data.

    More importantly, often it is unclear what is the central domain expertise. For example, when doing a project for marketing for a wholesale manufacturer with some retail outlets, do you need to be an omni-channel marketing expert, manufacturing consultant, or a retail expert or all of these? My contention is that this all round expert with just reasonable data science skills will be less successful in getting the most out of the available data when compared to a good data scientist with  experience in a few domains but with little domain expertise in these fields.

    Ultimately, I am talking about a multi-disciplinary approach with the data scientist being the domain expert in data science and bringing the analytics approaches that has worked in other domains.

  • Sandra Pickering

    The most important issue is not so much to come in with extensive domain knowledge but to come with an understanding of models and theories outside the data.

    I see too many analysts drawing conclusions from data on, say, buyer behaviour or consumer choice based purely on the data but without an understanding of human psychology or motivational science. From time to time the conclusions are statistically significant yet lacking in practical validity.