There has been a number of interesting articles recently, discussing the skills a data scientist should or might have. The one entitled The 22 Skills of a Data Scientist is a popular one (see 22 skills listed below, or click on the link to read the full article). Earlier this morning, I read another one on LinkedIn: Data Scientist – MUST have skills?. The picture below comes from this LinkedIn article. Some of these articles have been posted on our network, by external bloggers, for instance, skills you need to become a data scientist or Some software and skills that every Data Scientist should know. Popular ones include how to become a data scientist and Are You A Data Scientist ?
I tend to have some level of disagreement with many of these authors. My disagreement can be summarized as follows:
- Rather than defining data scientists by a bunch of skills that few employees possess (though many analytic executives possess all of them and more), it makes more sense to divide data scientists in multiple categories: data engineers, machine learning experts, modelers, business-oriented data scientists, researchers, domain experts, generalists etc each possessing a separate skillset. Google six categories of data scientists for details.
- Also, you can train data scientists to have all the required skills. Colleges do a poor job at that, focusing instead on delivering silo-ed, outdated curricula, and being out of touch with the real world. Some modern 6-month training will teach the foundations for self-learners, that's the purpose of our free data science apprenticeship using a project-based approach (real-life projects), though there are other alternatives.
The 22 skills in question
Would you add or remove some to this great list created by Matt Reany? First, I'd categorize these skills. Then, I certainly would add business acumen, domain expertise, hacking skills, presentation and listening skills, good judgment, not trusting models, ability to work in a team or with clients, all sorts of databases and file management systems, some data engineering, some data architecture and dashboard design, data detection, real time analytics, data vendor expert (vendor selection, benchmarking), be the metric expert in your company (even decide which metrics to track, how to collect the data).
- Algorithms (ex: computational complexity, CS theory) DD,DR
- Back-End Programming (ex: JAVA/Rails/Objective C) DC, DD
- Bayesian/Monte-Carlo Statistics (ex: MCMC, BUGS) DD, DR
- Big and Distributed Data (ex: Hadoop, Map/Reduce) DB, DC, DD
- Business (ex: management, business development, budgeting) DB
- Classical Statistics (ex: general linear model, ANOVA) DB, DC, DR
- Data Manipulation (ex: regexes, R, SAS, web scraping) DC, DR
- Graphical Models (ex: social networks, Bayes networks) DD, DR
- Machine Learning (ex: decision trees, neural nets, SVM, clustering) DC, DD
- Math (ex: linear algebra, real analysis, calculus) DD,DR
- Optimization (ex: linear, integer, convex, global) DD, DR
- Product Development (ex: design, project management) DB
- Science (ex: experimental design, technical writing/publishing) DC, DR
- Simulation (ex: discrete, agent-based, continuous) DD,DR
- Spatial Statistics (ex: geographic covariates, GIS) DC, DR
- Structured Data (ex: SQL, JSON, XML) DC, DD
- Surveys and Marketing (ex: multinomial modeling) DC, DR
- Systems Administration (ex: *nix, DBA, cloud tech.) DC, DD
- Temporal Statistics (ex: forecasting, time-series analysis) DC, DR
- Unstructured Data (ex: noSQL, text mining) DC, DD
- Visualisation (ex: statistical graphics, mapping, web-based data‐viz) DC, DR