There are two types of data scientists:

  • Vertical data scientists have very deep knowledge in some narrow field. They might be computer scientists very familiar with computational complexity of all sorting algorithms. Or a statistician who knows everything about eigenvalues, singular value decomposition and its numerical stability, and asymptotic convergence of maximum pseudo-likelihood estimators. Or a software engineer with years of experience writing Python code (including graphic libraries) applied to API development and web crawling technology. Or a database guy with strong data modeling, data warehousing, graph databases, Hadoop and NoSQL expertise. Or a predictive modeler expert in Bayesian networks, SAS and SVM.
  • Horizontal data scientists are a blend of business analysts, statisticians, computer scientists and domain experts. They combine vision with technical knowledge. They might not be expert in eigenvalues, generalized linear models and other semi-obsolete statistical techniques, but they know about more modern, data-driven techniques applicable to unstructured, streaming, and big data, such as (for example) the very simple and applied Analyticbridge theorem to build confidence intervals. They can design robust, efficient, simple, replicable and scalable code and algorithms.

DJ Patil, an Horizontal Data Scientist

Horizontal data scientists also come with the following features:

  • They have some familiarity with six sigma concepts, even if they don't know the word. In essence, speed is more important than perfection, for these analytic practitioners.
  • They have experience in producing success stories out of large, complicated, messy data sets - including in measuring the success.
  • Experience in identifying the real problem to be solved, the data sets (external and internal) they need, the data base structures they need, the metrics they need, rather than being passive consumers of data sets produced or gathered by third parties lacking the skills to collect / create the right data.
  • They know rules of thumb and pitfalls to avoid, more than theoretical concepts. However they have a bit more than just basic knowledge of computational complexity, good sampling and design of experiment, robust statistics and cross-validation, modern data base design and programming languages (R, scripting languages, Map Reduce concepts, SQL)
  • Advanced Excel and visualization skills.
  • They can help produce useful dashboards (the ones that people really use on a daily basis to make decisions) or alternate tools to communicate insights found in data (orally, by email or automatically - and sometimes in real time machine-to-machine mode).
  • They think outside the box. For instance, when they create a recommendation engine, they know that it will be gamed by spammers and competing users, thus they put an efficient mechanism in place to detect fake reviews. 
  • They are innovators who create truly useful stuff. Ironically, this can scare away potential employers, who, despite claims to the contrary and for obvious reasons, prefer the good soldier to the disruptive creator.

In my opinion, vertical data scientists are fake data scientists. They are the by-product of our rigid University system which trains people to become either a computer scientist, a statistician, an operations research or a MBA guy - but not all the four at the same time. This is one of the reasons why we have created our data science program. This is also one of the reasons why recruiters can't find data scientists: they find and recruit mostly vertical data scientists. Companies are not yet used to identifying horizontal data scientists - the true money makers and ROI generators among analytic professionals. The reasons are two-fold:

  • Untrained recruiters quickly notice that horizontal data scientists lack some of the traditional knowledge that a true computer scientist, or statistician, or MBA must have - eliminating horizontal data scientists from the pool of applicants. You need a recruiter familiar both with software engineering, business analysts, statisticians and computer scientists, and able to identify qualities not summarized by typical resume keywords, and identify which (lack of) skills are critical from the ones that can be overlooked, to detect these pure gems. 
  • Horizontal data scientists, faced with the prospects a few job opportunities, and having the real knowledge to generate significant ROI, end up creating their own start-up, working independently, sometimes competing directly against the very companies that are in need of real (supposedly rare) data scientists. After having failed more than once getting a job interview with Microsoft, eBay, Amazon or Google, they never apply again, further reducing the pool of qualified talent.

Hopefully, our data science program will help with this - in particular educating recruiters and hiring managers as well.

Question: Can you name a few horizontal data scientists? Vertical data scientists are a dime a dozen.

Related articles:

Views: 38504


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Georgi D. Gospodinov on March 29, 2017 at 11:56am
Very illuminating. Given the state of the data science profession in the industry (with automation looming around the corner, and certain vague interpretation of data science by employers which opens up the field to a wide range of skills), one needs to be both horizontal and vertical in order to succeed.
Comment by Paul K. Courtney on September 27, 2016 at 2:25pm

Thank you, thank you, thank you Vincent! When I first heard the term data science a few years ago I looked into it and realized, "Well I guess I've been doing data science then for the last 10 years!" and thinking outside the box is how I've learned SQL, SAS, Java Struts, MS Access (I'm not embarrassed... :) )some Python and now R, all in the service of getting work done. I'll now take this piece to my HR when they think they need a "real" data scientist for an interesting position.

Comment by mungunbat enkhbayar on August 27, 2015 at 4:54pm

There are lots of articles regarding big data, data science and data analytics.But most of them does not define what the data science is and what is data scientist.Dr. Vincent has tried to define it so far.

Comment by Ihe Onwuka on March 19, 2015 at 5:15am

The example below illustrates this article and it's arguments are just plain wrong. 

The person asking for assistance here is  the  inventor of XBRL and a CPA - no shortage of business knowledge there. The technology he is trying to convert to now was available when it was invented. All credit to him for what he is trying to do  but it is probably too late now. 


It is blatantly obvious to a person with the right technological background that this was a semantic web application all day long. Instead it was implemented with technology that is now otherwise obsolete.

Whether you are a horizontal, diagonal or perpendicular data scientist if you think you are qualified to do everything and  (I'm not at all suggesting Mr Hoffman does) don't talk to people with the right backgrounds at the right time you will make some very expensive and irreversible mistakes. 

Far better then to get used to working with people of diverse background instead of trying to polarise them.

Comment by Carlos on February 5, 2015 at 8:13am

Totally agree with you Vincent "You don't need a diploma to be a data scientist - all you need is a track record of success stories, measurable in dollars, and based on leveraging data, to call yourself a data scientist". I've tried to explain it in my blog with a little bit more explanation: But, What is a data scientist? . In my opinion a good Data Scientist is a person capable to acquire news skills shortly, with that guy or better, group of guys you will find the best solutions for business needs!

Comment by Sudhindra Ramesh Arsikere on September 18, 2014 at 6:11pm

DJ very apt! And hit the nail.  Much needed bifurcation about a technical DS and Functional DS. (DS=Data Scientist)

Comment by Vincent Granville on August 11, 2014 at 10:43pm

Hi Alex, I think you have nailed it dowm. I'm definitely a massively multi-threaded associative thinker, for the better or the worse - I think for the better.

Comment by Alex Esterkin on August 11, 2014 at 10:36pm

Horizontal vs Vertical has nothing to do with education.  Those who are massively multi-threaded associative thinkers will be horizontal data scientists no matter what - whether they are college dropouts or have PhD degrees.  Detail-oriented deductive types will be vertical no matter what.   These reasoning types are not interchangeable, and in most cases you cannot transform one type into another type by means of training, education, or work experience.

Comment by Vincent Granville on August 21, 2013 at 8:44am

Those who claim to be a data scientist and whose insights have no impacts on businesses, are fake data scientists. Google "fake data science" for more information. You don't need a diploma to be a data scientist - all you need is a track record of success stories, measurable in dollars, and based on leveraging data, to call yourself a data scientist.

Comment by abbas Shojaee on June 24, 2013 at 5:48pm

While I agree with Willy that data science at its birth needs to get specialized, I think having trustworthy data is not the challenge. Imperfection and vagueness, uncertainty, lack of knowledge (probability domain) or ambiguous boundaries (fuzzy domain), semi or unstructured data (text, voice, images or signals) are inherent parts of reality and they are coined to data too. Instead of working on models that win internal validity and lose external validity because of so perfect data that traditionally we try to bring them, we need to embrace the real imperfection and figure it out. In this way horizontal data mining as Vincent calls, and I prefer to call it Live data mining, performs as a living animal: agile, adaptable, robust to change and imperfection, moving, engaging, innovative and less framed.<br> <br> We already have a range of different disciplines to work with imperfect data but it seems that we need new theories and methods, especially to express the kinds of knowledge that we can not handle now perfectly.

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service