Subscribe to DSC Newsletter

This is a first attempt at classifying data scientists. I invite you to produce a more comprehensive, better solution.

The 10 pioneering data scientists listed here were identified as top data scientists in our previous article entitled data science equation, based on their LinkedIn profile. Here we computed, for each pioneer, the number of endorsements for each of the top 4 data science related skills: analytics, big data, data mining and machine learning; these skills were identified in our previous article as most strongly linked to data science. Then we normalized the counts, so it is expressed here as a ratio between 0 and 1, and for each individual, the total aggregated over these four skills is 100%. Now it makes our classification problem easier.

Note that the correlation between machine learning and analytics is very negative (-0.82). Likewise, the correlation between big data and data mining is very negative (-0.80). All other cross-skill correlations are negligible.

Notes:

  • (1) Kirk Borne skill sets is highly fragmented. Analytics is not listed, but many analytic-related skills are.
  • (2) Simon's profile experienced a spike in Analytics endorsements when we completed the second part of this analysis. The spike could be a data glitch.

Big Data (x-axis) / Machine Learning (y-axis) scatter-plot 

The big data / machine learning combo exhibits the strongest cluster structure among the 6 potential scatter-plots. Milind Bhandarkar (Pivotal's Chief Scientist), and to a lesser extent Eric Colson (former VP Data Science and Engineering at Netflix), are outliers, both very strong in big data. 

  • Kirk Borne (Professor, Computer Science, George Mason University) and Monica Rogati (VP Data, Jawbone) constitute one cluster, with strong machine learning recognition.
  • DJ Patil (former LinkedIn Chief Scientist) and Dean Abbott (President, Abbott Analytics) belong to an intermediate cluster, with good machine learning skills but not known for big data.
  • Simon Zhang (LinkedIn director, previously at eBay; not a machine learning guy), as well as closely related (at least on the graph) Gregory Piatetski-Spapiro (KDNuggets, Founder), Vincent Granville (Data Science Central, Co-Founder) and Marck Vaisman (EMC), constitute the remaining cluster. 

Comments

  • Is this a "big data" analysis? Yes and no. Yes, because I extracted what I wanted out of Terabytes of LinkedIn data, leveraging my expertise to minimize the amount of work and data processing required. No, because it did not involve massive data transfers - the information being well organized and easy to efficiently access. After all, you could say it's tiny data: 10 observations, 4 variables. But that 10 x 4 table is a summary table. Just identifying the data scientist with most endorsements on LinkedIn isn't easy, unless you have domain expertise
  • I performed what I would call "manual clustering". You could say that my analysis is light analytics. How much better (or worse!) can you do using heavy analytics: By extracting far more data from LinkeIn (200 people selected out of 5,000; 10 metrics), and applying a real (not manual!) clustering algorithm? And which metric would you use to assess the lift created by heavy analytics, over light analytics?

Who is the purest data scientist?

I compared the 4-skill mix of each of these 10 data scientists (as found in the above table), with the generic data science skill mix identified in the previous article (Data Science = 0.24 * Data Mining + 0.15 * Machine Learning + 0.14 * Analytics + 0.11 * Big Data). In short, I computed 10 correlations (one per data scientist) to determine who best represents data science.

It turns out that Dean Abbott is closest to the 'average' (which I defined as purest), while Milind Bhandarkar (a Big Data, Hadoop guy) is farthest from the 'center'. Despite repeated claims (by myself and others) that I am a pure data scientist, I score only 0.43 (sure, I'm also some kind of product / marketing / finance / entrepreneur guy, not just a data scientist, but these extra skills were isolated from my experiment). Surprisingly, Kirk Borne, known as an astro-physicist, scores high in the data science purity index. So does Gregory, who is known as a data miner.

Related article

Views: 17142

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Majid ALDOSARI on November 22, 2013 at 8:27am

also Kirk Borne is professor of Astrophysics and Computational Science,..not computer science.

Comment by Louis Frolio on November 22, 2013 at 4:56am

I work at EMC and I verified Bhandarkar is in the Pivotal group.

Louis Frolio.

www.datatechblog.com

Comment by Vincent Granville on November 21, 2013 at 9:27pm

Added value: This analysis could help employers decide on hiring decisions.

Comment by Vincent Granville on November 21, 2013 at 6:35pm

@Amy: It's almost, from a mathematical point of view, as if you have two spaces: One for people, and one for skills. I think  there is a duality principle that allows you to switch from one to the other, as skills are determined by people, and people determined by skills. Finally, the coefficients in my equations are likely to change over time. Maybe even by geography.

Comment by Carey G. Butler on November 21, 2013 at 3:21pm

Kirk is a 'purist', yes! 'Glad your getting recognized. Keep up the good work.

Comment by Vincent Granville on November 21, 2013 at 12:58pm

@Majid: I changed his affiliation to EMC (that's what it says on LinkedIn). 

Comment by Majid ALDOSARI on November 21, 2013 at 12:15pm

 In the line before "Comments": Marck Vaisman doesn't work for Pivotal.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service