This analysis shows the distribution of data scientists per country, city, gender and company. It is based on Data Science Central (DSC) member database, only including members who provided information regarding these fields, on sign-up. Not all members provide information about their location, company or gender. A small majority does, and these members tend to be over-represented in US. Differences with our main channel - Analyticbridge, created in 2007 - are highlighted. Data Science Central was created in 2012 and its members tend to be adopters of early data science, machine learning and Hadoop techniques. Analyticbridge (AB) has a more technical audience regarding statistical sciences, operations research, predictive analytics, data mining and more traditional analytical techniques, including a thriving forum section featuring technical questions about regression and clustering. Other DSC channels are not included in this study. Both AB and DSC now have similar member counts, though DSC membership is now growing much faster, about 6 times faster.
This analysis was completed by our intern Livan very recently, and supersedes any previous analyses that we did.
Breakdown per country
US represents more than 50% of members on DSC, a little less on AB.
Figure 1: DSC members across the globe
Figure 2: AB members across the globe
Figure 3: Top 20 countries for DSC ("others" includes missing data, and overlaps with all countries)
Figure 4: DSC members per country, log scale
Fast growing countries include Singapore, Spain and Ireland. Senior data scientists, executives and entrepreneurs are typically found in US, while India has a higher proportion of junior practitioners. Chinese people are very reluctant to provide personal information, which might explain why they are under-represented.
Breakdown per city
The Bay Area has the highest proportion of data scientists per 1,000 inhabitants. New York and London have more data scientists on DSC in absolute numbers (many in the finance industry), but a smaller proportion per 1,000 inhabitants.
Figure 5: top 20 US cities
Breakdown per company
75 % of the top 20 most represented companies are shared for both databases: DSC and AB. Here we share the results for DSC. Note that the top 20 companies (dominated by IBM, Accenture, Oracle, AT&T, Microsoft, EMC, Teradata, SAS, Deloitte, HP) represents vendors (EMC, Teradata, SAS) and giant companies with relatively fewer data scientists per 1,000 employees (IBM, Microsoft, HP). While Facebook, LinkedIn, Twitter and Google might be the most sexy to work for, they are not in the top 10 in this study, although they are in a previous smaller study based on 10,000 LinkedIn profiles connected to Dr Granville, distributed across 6,000 companies. The top 20 companies account for about 10% of all data: this data has a Zipf distribution (very thick tail). Many if not the majority of data scientists work in small companies, as entrepreneurs, freelancers, consultants, or in education (professors) or government agencies.
Most do not even have data scientist as job title: read the following articles for detail:
Figure 6: top 20 companies for DSC
Breakdown per gender
Distributions on AB and DSC are almost identical. Here we show the statistics for DSC. How to address gender imbalance is a difficult question. The same is true with racial imbalance (STEM disciplines dominated by people of Asian origin, in US).
Figure 7: Gender distribution for DSC (light blue = male)