This is a first attempt at classifying data scientists. I invite you to produce a more comprehensive, better solution.
The 10 pioneering data scientists listed here were identified as top data scientists in our previous article entitled data science equation, based on their LinkedIn profile. Here we computed, for each pioneer, the number of endorsements for each of the top 4 data science related skills: analytics, big data, data mining and machine learning; these skills were identified in our previous article as most strongly linked to data science. Then we normalized the counts, so it is expressed here as a ratio between 0 and 1, and for each individual, the total aggregated over these four skills is 100%. Now it makes our classification problem easier.
Note that the correlation between machine learning and analytics is very negative (-0.82). Likewise, the correlation between big data and data mining is very negative (-0.80). All other cross-skill correlations are negligible.
Big Data (x-axis) / Machine Learning (y-axis) scatter-plot
The big data / machine learning combo exhibits the strongest cluster structure among the 6 potential scatter-plots. Milind Bhandarkar (Pivotal's Chief Scientist), and to a lesser extent Eric Colson (former VP Data Science and Engineering at Netflix), are outliers, both very strong in big data.
Who is the purest data scientist?
I compared the 4-skill mix of each of these 10 data scientists (as found in the above table), with the generic data science skill mix identified in the previous article (Data Science = 0.24 * Data Mining + 0.15 * Machine Learning + 0.14 * Analytics + 0.11 * Big Data). In short, I computed 10 correlations (one per data scientist) to determine who best represents data science.
It turns out that Dean Abbott is closest to the 'average' (which I defined as purest), while Milind Bhandarkar (a Big Data, Hadoop guy) is farthest from the 'center'. Despite repeated claims (by myself and others) that I am a pure data scientist, I score only 0.43 (sure, I'm also some kind of product / marketing / finance / entrepreneur guy, not just a data scientist, but these extra skills were isolated from my experiment). Surprisingly, Kirk Borne, known as an astro-physicist, scores high in the data science purity index. So does Gregory, who is known as a data miner.