Subscribe to Dr. Granville's Weekly Digest

There are two types of data scientists:

  • Vertical data scientists have very deep knowledge in some narrow field. They might be computer scientists very familiar with computational complexity of all sorting algorithms. Or a statistician who knows everything about eigenvalues, singular value decomposition and its numerical stability, and asymptotic convergence of maximum pseudo-likelihood estimators. Or a software engineer with years of experience writing Python code (including graphic libraries) applied to API development and web crawling technology. Or a database guy with strong data modeling, data warehousing, graph databases, Hadoop and NoSQL expertise. Or a predictive modeler expert in Bayesian networks, SAS and SVM.
  • Horizontal data scientists are a blend of business analysts, statisticians, computer scientists and domain experts. They combine vision with technical knowledge. They might not be expert in eigenvalues, generalized linear models and other semi-obsolete statistical techniques, but they know about more modern, data-driven techniques applicable to unstructured, streaming, and big data, such as (for example) the very simple and applied Analyticbridge theorem to build confidence intervals. They can design robust, efficient, simple, replicable and scalable code and algorithms.

DJ Patil, an Horizontal Data Scientist

Horizontal data scientists also come with the following features:

  • They have some familiarity with six sigma concepts, even if they don't know the word. In essence, speed is more important than perfection, for these analytic practitioners.
  • They have experience in producing success stories out of large, complicated, messy data sets - including in measuring the success.
  • Experience in identifying the real problem to be solved, the data sets (external and internal) they need, the data base structures they need, the metrics they need, rather than being passive consumers of data sets produced or gathered by third parties lacking the skills to collect / create the right data.
  • They know rules of thumb and pitfalls to avoid, more than theoretical concepts. However they have a bit more than just basic knowledge of computational complexity, good sampling and design of experiment, robust statistics and cross-validation, modern data base design and programming languages (R, scripting languages, Map Reduce concepts, SQL)
  • Advanced Excel and visualization skills.
  • They can help produce useful dashboards (the ones that people really use on a daily basis to make decisions) or alternate tools to communicate insights found in data (orally, by email or automatically - and sometimes in real time machine-to-machine mode).
  • They think outside the box. For instance, when they create a recommendation engine, they know that it will be gamed by spammers and competing users, thus they put an efficient mechanism in place to detect fake reviews. 
  • They are innovators who create truly useful stuff. Ironically, this can scare away potential employers, who, despite claims to the contrary and for obvious reasons, prefer the good soldier to the disruptive creator.

In my opinion, vertical data scientists are fake data scientists. They are the by-product of our rigid University system which trains people to become either a computer scientist, a statistician, an operations research or a MBA guy - but not all the four at the same time. This is one of the reasons why we have created our data science program. This is also one of the reasons why recruiters can't find data scientists: they find and recruit mostly vertical data scientists. Companies are not yet used to identifying horizontal data scientists - the true money makers and ROI generators among analytic professionals. The reasons are two-fold:

  • Untrained recruiters quickly notice that horizontal data scientists lack some of the traditional knowledge that a true computer scientist, or statistician, or MBA must have - eliminating horizontal data scientists from the pool of applicants. You need a recruiter familiar both with software engineering, business analysts, statisticians and computer scientists, and able to identify qualities not summarized by typical resume keywords, and identify which (lack of) skills are critical from the ones that can be overlooked, to detect these pure gems. 
  • Horizontal data scientists, faced with the prospects a few job opportunities, and having the real knowledge to generate significant ROI, end up creating their own start-up, working independently, sometimes competing directly against the very companies that are in need of real (supposedly rare) data scientists. After having failed more than once getting a job interview with Microsoft, eBay, Amazon or Google, they never apply again, further reducing the pool of qualified talent.

Hopefully, our data science program will help with this - in particular educating recruiters and hiring managers as well.

Question: Can you name a few horizontal data scientists? Vertical data scientists are a dime a dozen.

Related articles:

Views: 12721

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Vincent Granville on August 11, 2014 at 10:43pm

Hi Alex, I think you have nailed it dowm. I'm definitely a massively multi-threaded associative thinker, for the better or the worse - I think for the better.

Comment by Alex Esterkin on August 11, 2014 at 10:36pm

Horizontal vs Vertical has nothing to do with education.  Those who are massively multi-threaded associative thinkers will be horizontal data scientists no matter what - whether they are college dropouts or have PhD degrees.  Detail-oriented deductive types will be vertical no matter what.   These reasoning types are not interchangeable, and in most cases you cannot transform one type into another type by means of training, education, or work experience.

Comment by Vincent Granville on August 21, 2013 at 8:44am

Those who claim to be a data scientist and whose insights have no impacts on businesses, are fake data scientists. Google "fake data science" for more information. You don't need a diploma to be a data scientist - all you need is a track record of success stories, measurable in dollars, and based on leveraging data, to call yourself a data scientist.

Comment by abbas Shojaee on June 24, 2013 at 5:48pm

While I agree with Willy that data science at its birth needs to get specialized, I think having trustworthy data is not the challenge. Imperfection and vagueness, uncertainty, lack of knowledge (probability domain) or ambiguous boundaries (fuzzy domain), semi or unstructured data (text, voice, images or signals) are inherent parts of reality and they are coined to data too. Instead of working on models that win internal validity and lose external validity because of so perfect data that traditionally we try to bring them, we need to embrace the real imperfection and figure it out. In this way horizontal data mining as Vincent calls, and I prefer to call it Live data mining, performs as a living animal: agile, adaptable, robust to change and imperfection, moving, engaging, innovative and less framed.<br> <br> We already have a range of different disciplines to work with imperfect data but it seems that we need new theories and methods, especially to express the kinds of knowledge that we can not handle now perfectly.

Comment by Catalin Ciobanu on June 23, 2013 at 5:24am

Great post. Here's an idea: if you are a great horizontal DS, you most likely have been a vertical DS sometimes in the past. One would not know when an analysis is 'good enough' unless you failed multiple times in the past by going too deep. And going too deep is what vertical is all about (e.g. a 5-year PhD program...)

Comment by Wilco van Ginkel on April 1, 2013 at 8:28am

Vincent,

Thanks for sharing this article - an interesting read, indeed.


However, it reads like the field of data scientists is a binary field: good (horizontal) or fake (vertical). Which - IMHO - is not the case. I believe that it is (or should be) more about data science teams, where each member adds value & different skills.


In such a team there is a place for both horizontal and vertical data scientists. Depending on the task at hand & scope, the distribution of horizontal and vertical within the team might differ.



Keep up the good writing!

Comment by Larry R Myers on March 27, 2013 at 6:52am

Vincent is right.  Our universities are producing specialists in narrow specialties.  Elementary education majors are not required to take any mathematics courses during their certification studies, for example.

Comment by Vincent Granville on March 20, 2013 at 8:38am

To clarify: 

By "horizontal", I mean broad spectrum as acquired after 10+ years of experience working in various industries with different roles (digital analyst, market research, software engineer, statistician - in finance, advertising and environmental statistics, large companies and start-up founder), combined with deep expertise in a few (usually more than one) domains (e,g. support vector machines, API development, Bayesian networks, Python, big data).

Comment by Vincent Granville on March 19, 2013 at 7:07am

@Dmitriy: Why not hire an horizontal data scientist with deep domain and technical expertise in advertising, someone just like me actually?

Comment by Dmitriy Kruglyak on March 18, 2013 at 9:32pm
@Vincent:

I beg to differ. If my business is advertising, I want people with some experience in advertising. I am not as interested in people who never even thought about relevant industry problems and only worked on for example designing OCR systems or search engines or health claim prediction or something else that does not readily translate.

"Horizontal" skills are a great building block but they come with a significant cost of ramp-up time to *really* understand the actual problem, the data, the exceptions, the use cases and the business imperatives. As far as giving senior management roles to people without understanding of the industry problems - that is a truly scary thought.

Follow Us

Videos

  • Add Videos
  • View All

© 2014   Data Science Central

Badges  |  Report an Issue  |  Terms of Service