There are two types of data scientists:
- Vertical data scientists have very deep knowledge in some narrow field. They might be computer scientists very familiar with computational complexity of all sorting algorithms. Or a statistician who knows everything about eigenvalues, singular value decomposition and its numerical stability, and asymptotic convergence of maximum pseudo-likelihood estimators. Or a software engineer with years of experience writing Python code (including graphic libraries) applied to API development and web crawling technology. Or a database guy with strong data modeling, data warehousing, graph databases, Hadoop and NoSQL expertise. Or a predictive modeler expert in Bayesian networks, SAS and SVM.
- Horizontal data scientists are a blend of business analysts, statisticians, computer scientists and domain experts. They combine vision with technical knowledge. They might not be expert in eigenvalues, generalized linear models and other semi-obsolete statistical techniques, but they know about more modern, data-driven techniques applicable to unstructured, streaming, and big data, such as (for example) the very simple and applied Analyticbridge theorem to build confidence intervals. They can design robust, efficient, simple, replicable and scalable code and algorithms.
DJ Patil, an Horizontal Data Scientist
Horizontal data scientists also come with the following features:
- They have some familiarity with six sigma concepts, even if they don't know the word. In essence, speed is more important than perfection, for these analytic practitioners.
- They have experience in producing success stories out of large, complicated, messy data sets - including in measuring the success.
- Experience in identifying the real problem to be solved, the data sets (external and internal) they need, the data base structures they need, the metrics they need, rather than being passive consumers of data sets produced or gathered by third parties lacking the skills to collect / create the right data.
- They know rules of thumb and pitfalls to avoid, more than theoretical concepts. However they have a bit more than just basic knowledge of computational complexity, good sampling and design of experiment, robust statistics and cross-validation, modern data base design and programming languages (R, scripting languages, Map Reduce concepts, SQL)
- Advanced Excel and visualization skills.
- They can help produce useful dashboards (the ones that people really use on a daily basis to make decisions) or alternate tools to communicate insights found in data (orally, by email or automatically - and sometimes in real time machine-to-machine mode).
- They think outside the box. For instance, when they create a recommendation engine, they know that it will be gamed by spammers and competing users, thus they put an efficient mechanism in place to detect fake reviews.
- They are innovators who create truly useful stuff. Ironically, this can scare away potential employers, who, despite claims to the contrary and for obvious reasons, prefer the good soldier to the disruptive creator.
In my opinion, vertical data scientists are fake data scientists. They are the by-product of our rigid University system which trains people to become either a computer scientist, a statistician, an operations research or a MBA guy - but not all the four at the same time. This is one of the reasons why we have created our data science program. This is also one of the reasons why recruiters can't find data scientists: they find and recruit mostly vertical data scientists. Companies are not yet used to identifying horizontal data scientists - the true money makers and ROI generators among analytic professionals. The reasons are two-fold:
- Untrained recruiters quickly notice that horizontal data scientists lack some of the traditional knowledge that a true computer scientist, or statistician, or MBA must have - eliminating horizontal data scientists from the pool of applicants. You need a recruiter familiar both with software engineering, business analysts, statisticians and computer scientists, and able to identify qualities not summarized by typical resume keywords, and identify which (lack of) skills are critical from the ones that can be overlooked, to detect these pure gems.
- Horizontal data scientists, faced with the prospects a few job opportunities, and having the real knowledge to generate significant ROI, end up creating their own start-up, working independently, sometimes competing directly against the very companies that are in need of real (supposedly rare) data scientists. After having failed more than once getting a job interview with Microsoft, eBay, Amazon or Google, they never apply again, further reducing the pool of qualified talent.
Hopefully, our data science program will help with this - in particular educating recruiters and hiring managers as well.
Question: Can you name a few horizontal data scientists? Vertical data scientists are a dime a dozen.