This is a first attempt at classifying data scientists. I invite you to produce a more comprehensive, better solution.
The 10 pioneering data scientists listed here were identified as top data scientists in our previous article entitled data science equation, based on their LinkedIn profile. Here we computed, for each pioneer, the number of endorsements for each of the top 4 data science related skills: analytics, big data, data mining and machine learning; these skills were identified in our previous article as most strongly linked to data science. Then we normalized the counts, so it is expressed here as a ratio between 0 and 1, and for each individual, the total aggregated over these four skills is 100%. Now it makes our classification problem easier.
Note that the correlation between machine learning and analytics is very negative (-0.82). Likewise, the correlation between big data and data mining is very negative (-0.80). All other cross-skill correlations are negligible.
Notes:
Big Data (x-axis) / Machine Learning (y-axis) scatter-plot
The big data / machine learning combo exhibits the strongest cluster structure among the 6 potential scatter-plots. Milind Bhandarkar (Pivotal's Chief Scientist), and to a lesser extent Eric Colson (former VP Data Science and Engineering at Netflix), are outliers, both very strong in big data.
Comments
Who is the purest data scientist?
I compared the 4-skill mix of each of these 10 data scientists (as found in the above table), with the generic data science skill mix identified in the previous article (Data Science = 0.24 * Data Mining + 0.15 * Machine Learning + 0.14 * Analytics + 0.11 * Big Data). In short, I computed 10 correlations (one per data scientist) to determine who best represents data science.
It turns out that Dean Abbott is closest to the 'average' (which I defined as purest), while Milind Bhandarkar (a Big Data, Hadoop guy) is farthest from the 'center'. Despite repeated claims (by myself and others) that I am a pure data scientist, I score only 0.43 (sure, I'm also some kind of product / marketing / finance / entrepreneur guy, not just a data scientist, but these extra skills were isolated from my experiment). Surprisingly, Kirk Borne, known as an astro-physicist, scores high in the data science purity index. So does Gregory, who is known as a data miner.
Related article
Comment
also Kirk Borne is professor of Astrophysics and Computational Science,..not computer science.
Added value: This analysis could help employers decide on hiring decisions.
@Amy: It's almost, from a mathematical point of view, as if you have two spaces: One for people, and one for skills. I think there is a duality principle that allows you to switch from one to the other, as skills are determined by people, and people determined by skills. Finally, the coefficients in my equations are likely to change over time. Maybe even by geography.
Kirk is a 'purist', yes! 'Glad your getting recognized. Keep up the good work.
@Majid: I changed his affiliation to EMC (that's what it says on LinkedIn).
In the line before "Comments": Marck Vaisman doesn't work for Pivotal.
© 2016 Data Science Central Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
You need to be a member of Data Science Central to add comments!
Join Data Science Central