Subscribe to Dr. Granville's Weekly Digest

There are many lists of top data scientists on Twitter. Here we mention and comment one of them, published on BigData-MadeSimple; most other lists have similar drawbacks. It has been argued that the overlap between my Top 25 data scientists on LinkedIn and the Twitter list is small, because there are LinkedIn versus Twitter data scientist users (and maybe Quora or Google+ data scientists).

But is this the real explanation? It turns out that the Twitter lists typically suffer from big gaps (my LinkedIn list has big gaps too, but at least I mention the gaps and biases). This Twitter list misses people that should be in the top three.

So how do you create a good list (with no big gaps)? The answer is simple: use data science! More explicitly, here are 7 tips to make a good list (and we'll share our list with you when we are done)

7 tips to make great lists of popular data scientists

  1. Do not rank people, put them in alphabetical order (the rankings in the list below look arbitrary)
  2. Use sound metrics to identify top data scientists: people tweeting about data science, machine learning, Hadoop etc. (assuming you know all popular terms related to data science, if not read this)
  3. Focus on robust metrics, such as number of followers (difficult to fake), rather than soft metrics such as number of tweets, which are easy to fake
  4. Filter out bad data: people with a sudden massive spike in number of followers (likely to be fake followers purchased on the black market, especially if there's no spike in number of tweets, and if none of the followers are popular data scientists). Filter out people who are not data scientists (recruiters) but occasionally or always tweet about data science. People whose tweets produce very few retweets, are not influencers.
  5. Check profiles over time: an old profile that has stopped growing 3 years ago should score less than a recent profile growing fast
  6. Using hash tags to identify popular data scientists is not a great idea: you will miss all tweeters that rarely use hash tags or that are using hash tags that you don't know; you must combine hash tags with other metrics to identify popular people
  7. Identify automated vs. manual tweets; most big accounts have some automated tweets, but if more than 90% of the tweets are automated, then we are dealing with a bot, not a human

One of the people not listed in the initial list of 33 data scientists (below) is @analyticbridge; it has more followers (22,900) than pretty much all the people listed below, and is growing faster than many, currently at a rate faster than (say) @kdnuggets (both accounts reached 20,000 followers almost the same week, a few weeks ago). I'm sure many are missing for the same reason: they don't use their name in their profile; @analyticbridge uses 'big data science' rather than his name. Of course, you need to filter out commercial accounts owned by corporations when considering these types of accounts, but that should be easy, using white lists of commercial Twitter accounts.

So who's @analyticbridge? Who else is missing? I'll leave it to you to find out, but we hope to provide a list of our own soon, to fill the gap. Of course, @analyticbridge must be someone with a small ego, as he's not interested in having his name published. In an era of privacy scares, not using your real name could be a good strategy.

Notes

  • Besides being accurate, not missing very popular, highly connected data scientists, will get your list of "top data science tweeters" re-tweeted and shared by the most connected thought leaders (the ones you did not miss!), potentially multiplying the traffic volume to your web site by a factor ten. Not including these people, from a journalistic point of view, is missing a big opportunity of free traffic to your website.
  • Another Twitter account that my business partner (and cofounder of Data Science Central) has created is @DataScienceCtrl. It has close to 7,000 followers, and growing even faster than @AnalyticBridge, so it could also fit in the top 33. However we view this account more as a business account. Also its tweets will soon be mostly automated, and we hope that it will become the second best source of automated tweets about data science (we expect the first source to be a secret project that we are currently working on). 

BigData-MadeSimple list of top 33 data scientists

They added @analyticbridge in position #34 after I mentioned the issue. The first number after the handler is the number of followers as of today; @analyticbridge (not in the original list)  has 22,900.

  1. Hilary Mason @hmason - 44,600
  2. John Myles White @johnmyleswhite - 8,573
  3. Peter Skomoroch @peteskomoroch - 18,100
  4. Gregory Piatetsky @kdnuggets - 21,000
  5. Ryan Rosario @DataJunkie - 8,794
  6. DJ Patil @dpatil - 18,300
  7. Jeff Hammerbacher @hackingdata - 16,300
  8. David Smith @revodavid - 11,200
  9. Christopher D. Long @octonion - 11,300
  10. Carla Gentry @data_nerd - 13,400
  11. Ben Lorica @bigdata - 18,900
  12. Siah @siah - 4,876
  13. Ferenc Huszar @fhuszar - 2,113
  14. Drew Conway @drewconway - 9,902
  15. Michael Wu Ph.D. @mich8elwu - 8,332
  16. Matt Wood @mza - 6,643
  17. Olivier Grisel @ogrisel - 7,328
  18. Josh Wills @josh_wills - 5,714
  19. John Foreman @John4man - 9,151
  20. Jake Porway @jakeporway - 7,088
  21. Andrew Ng @AndrewYNg - 18,500
  22. Eric Xu @mathena - 10,400
  23. Monica Rogati @mrogati - 9,342
  24. P. Oscar Boykin @posco - 4,640
  25. Benedikt Koehler @furukama - 6,449
  26. David Gutelius @gutelius - 2,413
  27. Marck Vaisman @wahalulu - 1,340
  28. Andreas Weigend @aweigend - 2,413
  29. Amy Heineike @aheineike - 1,561
  30. Sebastian Thrun @SebastianThrun - 24,300
  31. Jen Lowe @datatelling - 4,558
  32. Doug Cutting @cutting - 10,500
  33. Kirk Borne @KirkDBorne - 12,600

Views: 3262

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Livan Alonso on July 26, 2014 at 6:47pm

Hi Amine

Very interesting article. It would be great to implement different metrics and compare them. Personally, I prefer few great followers than thousands of fake followers.

Thanks for your comment.

Best regards,

Livan

Comment by amine benhenni on July 10, 2014 at 10:55pm

Regarding the number of followers, this is an interesting reading : https://medium.com/i-data/fake-friends-with-real-benefits-eec8c4693bd3
Good metrics would include network based centrality measures, but it's more involved to implement !

Follow Us

Videos

  • Add Videos
  • View All

© 2014   Data Science Central

Badges  |  Report an Issue  |  Terms of Service