Implementing data science projects – the five essential skill sets

The future of business, it is argued, is digital. At the core of this digital transformation is the ability to harness data in enabling better business decisions. Typically, organizations have teams of experts who work on existing data sets to apply diverse analytic tools and techniques to make sense of the data. The more statistically advanced among these teams work on typical ‘data science’ problems. Data science problems are where you need to apply sophisticated algorithms on large data sets to derive business relevant insights. These either involve the application of advanced statistical techniques (like support vector machines, artificial neural networks etc.) or the handling of very large data sets (running into petabytes for example). Whether you are just starting off or are a pro, chances are that you have a couple of parallel data science projects going on in the organization already.

The implicit promise is that all your data, which had been lying idle for so long, can now start contributing to better business decisions. All you need to do is – gather data from all these multiple sources, analyze them, and generate business relevant insights. Yet, you stumble upon new challenges every day in managing these projects, defining the outputs, and trying to link the outputs to the business goals. So how exactly is a ‘data science’ project implemented? What are the key skills you or your team need for creating such data science solutions?

In this post, I highlight some of the key aspects which in my opinion are essential for driving successful data analytics solutions and thinking. I identify five key skills/personalities which in my opinion are central to the success of any data science endeavor. I highlight why a combination of the skills are essential for deriving business relevant insights, and creating scalable solutions.

The Five towers of a data science solution

Data science projects generally require three essential skills – statistics /machine learning skills, business skills, and coding skills. It is not a stretch to say that a person possessing all the relevant skills is extremely rare – a unicorn in the data science parlance. Most people, on the other hand, have a combination of one or more of these skills. It is in these contexts that identifying the key skill composition of a data science team and ways of enabling the members to collaborate and work together become essential. Further, two additional skills which people often overlook are the data visualization skills, and the project/product management skills.
In the following sections, I briefly describe each of these profiles that I believe every data science project requires-

1. The Professors (or the Algorithms team)

This is the person responsible for all your algorithms and implementations of those algorithms in some statistical computing language. Typically a PhD/ master’s degree holder with relevant experience in creating and handling data models and advanced statistical/ machine learning techniques. The key skills demonstrated by the person include one or more of – R, Python, Algorithms, and Machine learning. The professor and her team are key to identifying the new and existing algorithms which can help you generate insights and do things with data which you never thought were possible.

Yet, the models developed in this team can quickly become very difficult to implement (imagine a combination of Natural language processing, machine learning, and social network analysis in a single module) and impossible to scale & deploy unless supported by some other key skill sets.

2. The Data Nerds ( or The Big Data team )

The person who can handle loads of (big) data without batting an eyelid. Typically able to find any needle in any haystack – the big data person is able to build castles and databases in the cloud. She is proficient in skills like – ETL, Big Data, and cloud computing platforms. These people form the backbone of the data science projects and are key in making scalable and deployable solutions. It is often said that almost 80% of the time in any analytics project is spent on gathering, cleaning, and massaging the data.
To link this to the business objectives is the obvious next step. And this is where the domain expert makes her entrance.

3. The Suits ( or The Domain Experts)

While adequate expertise in the first two skillsets, ensure great data models that work, you will need a domain expert or a business person to actually put this to (your clients’) use. Typically, this person is very cognizant of the Industry specific analytics and measures, and has excellent communication and presentation skills. This is the person who typically has an MBA background or/and years of industry experience.

4. The Data Designers ( or the Visualization and Design team)

Another increasingly important aspect of a data science project is the design and visualization of the results and analysis. This is essential since you are trying to present sophisticated analysis to people who may not have experience/training/interest in statistical and data science methods. Add to this the fact that all your outputs now need to be responsive, i.e., view equally well on the laptop, tablet, or mobile. The outputs and insights you generate must be a natural part of the workday of the end-user. Thus, understanding the user journey, the personas, and the user interactions become crucial. You may not be building the next Apple, but a reasonably intuitive interface is still essential. Especially, if you are building guided analytics projects.

5. The Cat Herders (or The Product Managers)

The product manager needs to manage and make the diverse group work together and agree on key points. The product manager for data science projects needs to be an all-rounder with program management, client interfacing, data science, and team management skills. Experience in herding cats is a bonus. These are the people who need to understand the data models, as well as the end user and guide the outcomes of the cross-functional team towards measurable business goals.

Very often organizations form teams which consists of experts with only a subset of these skills. The allure of data gathering, data cleaning, model building, model optimizing, and generating reports, is not just interesting, but also very addictive. It is also one of the most oft repeated mistakes in the data science world. To avoid this, make sure you do not lose sight of the ‘whys’ by concentrating too much on the ‘how’s. A correctly balanced team is one of the basic prerequisites on your journey towards solving ever more sophisticated and challenging data science problems.


Leave a Reply

Your email address will not be published. Required fields are marked *