Subscribe to DSC Newsletter

Top 20 Big Data Experts to Follow (Includes Scoring Algorithm)

This article has two parts: 

  • Listing the top 20 experts, along with their Twitter handle, rank in reverse order, number of Twitter followers, and Klout score. We hope to soon see a woman among the top 10.The top woman is currently #11.
  • Discussing a robust methodology to score experts

Source for picture: click here

1. The top 20

This is a subset of a bigger list published here. Note that our data scientist is ranked #6.

Rank                                                                   Twitter                                            Followers     Score
20 Bernard Marr BernardMarr 86K 66.5
19 Jeremy Waite jeremywaite 93K 67.5
18 R Ray Wang wang0 80K 67.6
17 Hadley Wickham hadleywickham 23K 68
16 Mike Briercliffe mikejulietbravo 54K 68.5
15 Evan Sinar EvanSinar 29K 68.6
14 Bob E. Hayes bobehayes 5K 68.65
13 Dez Blanchfield dez_blanchfield 77K 68.7
12 Andrew Ng andrewng 48K 69.5
11 Hilary Mason hmason 68K 70
10 Gregory Piatetsky kdnuggets 48K 70.35
9 Ronald van Loon Ronald_vanLoon 29K 71.5
8 Hans Rosling HansRosling 296K 72.05
7 Randy Olson randal_olson 80K 73
6 Vincent Granville analyticbridge 128K 73.5
5 Timothy Hughes Timothy_Hughes 134K 73.6
4 Kirk Borne kirkdborne 58K 74
3 Vala Afshar ValaAfshar 101K 78.5
2 Simon Porter simonlporter 66K 80.5
1 Nate Silver NateSilver538 1328K 81

2. Proposed Algorithm to Score Experts

Scores can measure many things: popularity, how influencial someone is in a specific domain, and so on. We have worked on creating various lists over the past few years, typically with a goal different from journalists, rewarding expertise and the volume of quality publications and references, over traditional popularity metrics. We have built various lists of top data science / big data experts:

You should check these three lists and the associated literature, not just out of curiosity, but to discover the methodology used in each case: a methodology designed by a real data scientist, not a black-box tool used by a journalist. Thus our lists are robust, sound and unbiased - or at least the bias is known and disclosed.

Since we have seen lists in the past where the #1 expert was irrelevant, here we propose a 3-steps methodology to build lists and compute scores:

Step #1: Categorize sub-domains (of big data, data science, etc.)

Break the domain into sub-domains. For instance, we established a while back that 

Data Science = 0.24 * Data Mining + 0.15 * Machine Learning + 0.14 * Analytics + 0.11 * Big Data

Read this paper to learn about the methodology used to arrive at this equation. Note that weights and even sub-domains evolve over time. And these sub-domains overlap, though that's not difficult to handle.

Step #2: Categorize experts, and score by sub-domains

Start with a large list of experts, make sure you are not missing any big ones (I have seen lists that were missing the number one expert).

Then categorize these experts according to pre-selected sub-domains (big data, machine learning, and so on in this case). This is performed by

  • scraping tons of tweets or blog posts from these experts (or better, from high-score people talking about these experts),
  • creating keyword frequency tables,
  • extracting (for each expert) keywords associated with the sub-domains,
  • and eventually clustering these experts by sub-domains.

This is done using an indexation algorithm. We have used an indexation (or automated tagging) algorithm in a very similar context, to assign sub-categories to 2,500 data science blogs. The methodology is explained in details here. If the data is well structured, you can proceed as here: we were able to determine that Gregory Piatetski-Spapiro and Vincent Granville belongs to a same cluster, while Kirk Borne and Monica Rogati belongs to another, machine learning heavy cluster.

Note: Klout scores (actually ranks) are also available at the sub-domain level, click here for details.

Step #3: Blend scores across sub-domains

Blend the scores obtained at the sub-domain level (in step #2) using the blending formula obtained in step #1.

Caveat: Experts that do not tweet or publish much might not have sub-domain scores that are statistically significant. This can be handled by computing an aggregated score across sub-domains, and ignoring the sub-domain scores. Statistical significance, at the score level, can be computed using the following method

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 16165

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Kirk Borne on March 4, 2016 at 1:48pm

To OG Mack Drama:  man, you are so very kind and thoughtful. Thank you a million times for your generous comments. I congratulate your accomplishments and growth in this field. We need many more, and you are setting a high standard for others to follow. Thanks! And wishing you all the best in your bright future.

Comment by OG Mack Drama on March 4, 2016 at 6:07am

Dr Kirk Borne to me is the #1 Big Data Scientist as far as I am concerned. Why? When I first was introduced to Dr Borne. By my Best friend from childhood Mr. Steven G. Jackson of Seattle, WA; who is a consultant in Enterprise Solutions.My senior business advisor & partner. Who Also, knows a lot about Big Data.He blogs about it.

I did not have an inkling into ; what was Big DataMachine Learning, Predictive AnalyticsInternet of Things , Cloud, ClusteringData Mining, Data Warehousing Hadoop (Love the people over at Apache)..

Not because he holds a lot of degrees; from some prestigious University. Not because he works; for one of the top consulting firms in the world.

It was because; he allowed a man named OG Mack Drama (me), to glimpse into his world, & it was fascinating!

He welcomed my occasional intrusion; into his social media sphere ,without missing a beat!.It was enough,  for me to read up on the subject matte'r; & discover that; if you are not at least minimal; understanding of it's importance ,and significance .

You & those around you;  will be lost in the coming century! America's education is not generating enough students; who wish to pursue Mathematics & other Science disciplines.

I am a minority. So we all know if the country is short , my side is even shorter. #Facts. Well, I make it my duty to inform all those around me , especially people not involved with those disciplines to get involved

. "If OG Mack Drama can do it so can you"!. Is what I am always telling younger'; & older people alike.

No, I am far from an expert; or fully understanding of it all. But I know enough; to grasp its importance, & how profound , it revolves around all we do.

If Dr Borne had been aloof , haughty or even elitist. In his treatment or interaction with me. I would not be here in this conversation!

I was even on the top 50 twitter accounts to fo  llow regarding Big Data (per Data SCience Central). I know that's ridiculous; but t was done using analytics of certain data analysis (semi-structured data). I talk about, I tweet about it. I post about it. I RT about it. 

Due to the fact that; My Klout Score is 71;  I am also in the top .05% of ALL SOCIAL MEDIA USERS. (per a an app that uses big data to measure a person's social media influence)That Data profiling  felt my presence

. That fact alone;  is helping me understand the science of BIG DATA . I am seeing it by my own life  & not on an Enterprise level,  but as an individual Which I know is darn near impossible!  However here I am!

.I am conducting all my ventures utilizing Big Data & Predictive Analytics. To gain new market share & to beat out the competition.

All because I met a man who treated me as decent human being.

 Thank you to the #1 Data Scientist in the world Dr Kirk Borne of Booze Hamilton Allen.

Comment by Leo Li on January 25, 2016 at 5:15pm

great article! Thanks

Comment by Prof. Dr. Diego Kuonen on January 18, 2016 at 1:35am

Great being in your neighbourhood on that graph, Vincent :)

Comment by Vincent Granville on January 17, 2016 at 10:51am

It looks like this analysis might have been performed by Marc Smith. It is posted here. Below is a graph that summarizes this info, posted on the same website. The original version is much larger.

Comment by Hajime Ozaki on January 16, 2016 at 2:10am

R Ray Wans twitter account is rwang0 not wang0.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service