Subscribe to DSC Newsletter

Do data scientists need to be domain experts to deliver good analytics?

During my 30 years of analytics career, prospective employers and clients have often asked me: "How can you help us with data-driven insights when you have not worked in this industry before?". I argue for greater emphasis on machine learning skills in the data scientist and their partnership with domain experts as an effective pathway to bring data science to a business.

Clearly, the description of data scientist as the mythical unicorn who has computer science skills, statistical knowledge and domain expertise (Figure 1) has had an impact. The proliferation of different analytics disciplines such as social network analysis, digital analytics, bio-informatics and supply chain analytics, lends weight to the argument that domain expertise definitely matters.

figure1

Source: Drew Conways Data Science Venn Diagram

There are also anecdotes on the web of data science projects that went pear shaped because the analysts were not subject matter experts. A deeper look into these anecdotes reveals that the issues are not due to a lack of domain expertise, but due to poor data science such as over-fitting of data, bad sampling methods and unnecessary data cleansing. Still the myth that domain expertise trumps all else continues!!

Data mining competitions such as Kaggle and KDD have demonstrated the opposite and shown how data science can be successfully outsourced to people without domain expertise. Many companies have run competitions on such diverse topics as optimizing flight routes, predicting ocean health and diabetic retinopathy detection. Data scientists with little or no expertise in the domain have responded brilliantly with useful solutions. Adam Kowalczyk and I won the KDD Cup on yeast gene regulation prediction with no background in biology. Some data scientists, such as David Vogel and Claudia Perlisch, have even won across multiple domains, indicating that data science skills are transferable across domains.

The counter argument to Kaggle’s success is that in these competitions, the domain experts have already generated the hypothesis by posing the right business question and preparing the data (Figure 2), and the competitors need only model and test. But, in the brave new world of massive data along with the mathematical tools and computing power to crunch these numbers, old world paradigm of hypothesizing before modeling is likely to be challenged. Google has shown a whole new way of understanding the world without any a priorimodels or theories with their approach to language learning.

figure2

Source: Dr. Bhavani Raskutti, Data Mining Lead, Pacific Brands, “Data Mining in Industry: Putting Theory into Practice”, guest lecture Royal Melbourne Institute of Technology, 2011.

So, if domain expertise is not necessary for the steps of posing the business question and analytical problem definition, what about data acquisition and data preparation?

In my experience, domain knowledge about data capture and transformation processes at the sensors can be acquired through exploration of the raw data. Often, good data scientists become subject experts just by playing with the data and asking questions to domain experts about the data anomalies. For instance, using just such a process, my analytics team in a manufacturing company identified a long standing, but previously undiscovered anomaly in the summarised sales and inventory feed from a large retailer. This anomaly materially affected the retail inventory reporting and had to be fixed programmatically. Subsequently, my data science team members were the acknowledged retail supply chain experts!!

Domain expertise is most relevant, perhaps, in the interpretation of insights, particularly those insights gained using unsupervised learning about the workings of complex physical processes. An example of just such a situation was the use of Aster discovery platform to perform root cause analysis of failures in a multiple aircraft fleet from aircraft sensor and maintenance data. While the analysis started with no a priori model, a post prioriinterpretation of the results from the path analysis and the subsequent follow-up to improve aircraft safety certainly required domain expertise.

Returning back to the original question: ‘How can you help us with data-driven insights when you have not worked in this industry before? ‘, my response is as follows.

  1. Machine learning (the intersection of computer science and statistics in Figure 1) brings a fresh perspective that leads to new insights and no prior domain knowledge can potentially be advantageous, especially in overcoming long standing domain bias.
  2. Provided the machine learners have curiosity and willingness to learn about the company and domain along with the humility to ask the domain experts about the subject, they will not only understand the domain, but through their questioning they will cross-pollinate the subject matter experts so the team as a whole is stronger.

So, when hiring a data scientist, focus on the machine learning aspect, particularly, the desire to play with the data using a number of different techniques and languages. Consider also the analytical skills to question and solve problems iteratively. Partner the data scientists with domain experts so cross-pollination can occur. This, to me, is a better pathway for bringing data science to a business than searching for the elusive unicorn depicted in Figure 1.

Bhavani Raskutti is the Domain Lead for Advanced Analytics Teradata ANZ . She is responsible for identifying and developing analytics opportunities using Teradata Aster and Teradata’s analytics partner solutions. She is internationally recognised as a data mining thought leader and is regularly invited to present at international conferences on Mining Big Data. She is passionate about transforming businesses to make better decisions using their data capital.

This article was originally posted on my Teradata Blog.

Views: 6206

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Sandra Pickering on November 4, 2016 at 8:21am

The most important issue is not so much to come in with extensive domain knowledge but to come with an understanding of models and theories outside the data.

I see too many analysts drawing conclusions from data on, say, buyer behaviour or consumer choice based purely on the data but without an understanding of human psychology or motivational science. From time to time the conclusions are statistically significant yet lacking in practical validity.

Comment by Bhavani Raskutti on August 2, 2015 at 2:11pm

Agree that domain expertise (knowledge in general in any field) never hurts. But what I am arguing in the article is that domain expertise in the data scientist is not the most critical component in delivering good insights from data.

More importantly, often it is unclear what is the central domain expertise. For example, when doing a project for marketing for a wholesale manufacturer with some retail outlets, do you need to be an omni-channel marketing expert, manufacturing consultant, or a retail expert or all of these? My contention is that this all round expert with just reasonable data science skills will be less successful in getting the most out of the available data when compared to a good data scientist with  experience in a few domains but with little domain expertise in these fields.

Ultimately, I am talking about a multi-disciplinary approach with the data scientist being the domain expert in data science and bringing the analytics approaches that has worked in other domains.

Comment by Vincent Granville on July 30, 2015 at 10:32am

It certainly doesn't hurt to have domain expertise. In my practice, being a domain expert not only reduces costs (no need to hire someone else) but also efficiency, as I can see how to make all the pieces (business side, product development, technical)  fit together. The smaller the company, the more you need domain expertise to succeed.

Comment by Pradyumna S. Upadrashta on July 24, 2015 at 10:07am

If your Data Science results don't deliver deep analytic insights into the process domain; what exactly are you actually doing/delivering?  So, even if a Data Scientist isn't an expert in the beginning of an engagement, (s)he should be by the end.  The question assumes that a Data Scientist can stay hands off of a domain and continue to add relevant value -- but knowing the right questions demands domain experience.  Knowing what haven't been asked before is also a domain expertise.  You have to have some idea of what you are looking for, at the very least, to know when you have found a gold nugget vs. fool's gold.  Insights can happen through an unsupervised process, but to understand their relevance, and how they might be exploited to drive value, requires the ability to translate those insights into domain specific knowledge.

Comment by Arthur Tabachneck on July 24, 2015 at 9:59am

Very nice article Bhavani! I don't usually agree with Vincent but, in this case do, but possibly for a different reason. We all can eventually be replaced but, in many situations (regardless of organization size or state), the analytical goals aren't immediately clear and the data was collected for a totally different purpose.

Isn't it common that it takes about 5 years to gain expertise in a given area? And, the more expertise one has (whether it be learned on the job, from reading, or from learning from domain expert colleagues), the more adept they will be at choosing the right data, at the best levels to accomplish the analytical needs.

Sure, sometimes the data is already exactly what's needed, and the goals are known. In such cases, I agree, domain expertise may not be needed and, in fact, may blind one to the many better possibilities.

Art, CEO, AnalystFinder.com 

Comment by Chris Barnes on June 15, 2015 at 3:55pm

This article effectively highlights many of the approaches I have taken over >40 years of job experience, entering originally through the (pure/applied) mathematics door. In some areas where I have been engaged for a decade or so, I have learned so much from the subject matter experts that I became a domain expert (e.g. hydrology, stable isotopes), whereas in others (e.g. Forestry) a loss of a domain expert to work with forced a change of direction. For details, see my pubs. etc. on academia.com.

Comment by Pradyumna S. Upadrashta on April 26, 2015 at 8:35am

Need vs. "Nice to have".  Don't "need".  But, usually "Nice to have".  Having domain expertise certainly helps to accelerate to a solution.  On the other hand, sometimes not knowing why something can't be done, can lead to innovation.

Comment by Bob Vanderheyden on April 16, 2015 at 3:04pm

If the Data Scientist can "borrow" expertise from data savvy people within the company, then they don't necessarily need to be domain experts.  In my 24 years doing the work of what is now called "Data Science" the most egregious errors that I've seen were made due to lack of understand of the data and how it was derived within the company's operations (a key component of "domain expertise").  In fact, I can't recall any true errors in the work that didn't have a lack of understanding of this sort as a key contributor.  

Other meaningful, well done analyses were never acted upon because the analyst couldn't sufficiently explain the results in the context of the business and its operations, which came from a lack of "domain expertise". If a analysis isn't acted upon it's provide zero value so the Data Scientist isn't successful in their work.

While a Data Scientist doesn't absolutely have to have domain expertise to have success, but the risk of poor results increases greatly if they don't (or can't borrow expertise from somewhere).

Comment by Steven Rosen on April 2, 2015 at 9:41am

I believe a data scientist SHOULD have -- or should at least aspire to having -- domain expertise. Ultimately, a data science project supports a business objective. In addition, non-technical "end-users" are often wary of deploying solutions that result from data and statistical processes they don't understand. A data scientist who has domain expertise sufficient to understanding and acting successfully on a given business objective and who can talk the domain expert's "language" stands a much greater chance of developing a "tool" which will work properly and be more readily adopted within the organization. Domain expertise not only helps to ensure the relevancy of the "tool" but also builds trust among the folks asked to use or depend on it.

Comment by Bhavani Raskutti on March 31, 2015 at 6:13pm

Vincent.

I wasn't arguing against domain expertise. Just pointing out that the data scientists need not come into the company as domain experts -- they can imbibe domain knowledge from the experts around them and in turn they can impart the machine learning knowledge to the domain experts.

Agree that predictive model development can be totally automated especially if the modelling is around the same data set; however, the way the business makes use of the models and the feedback loop to refine the questions are still something the business needs to work out in partnership with the data scientists.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service