Please check the updated list of projects, We've added quite a few!
Here's the response sent to applicants in early May, 2014:
We have received a lot of interest from many, many qualified candidates, most having already a strong analytic background.
Currently, I am the only reviewer for the submitted projects (and only two potential projects are offered), and even if I could find some external help, chances are that other reviewers/mentors would be more traditional, and not necessarily the kind of practitioners that you want to work with.
In view of the large response, please email me at [email protected] if
- You are only interested in the certification (also called endorsement). Many candidates could earn it without our apprenticeship, just based on their bio.
- You are OK if other reviewers/mentors are brought in. These more traditional mentors might have more available time to share with students. My style, to the contrary, is somewhat hands-off due to my work load, and suitable for self-learners.
Also, I'd like to offer more than two projects - it's just a question of having the bandwidth. But I made the following decision: any project that could help DSC, will qualify. Actually, the RSS feed project currently approved clearly helps DSC. Just submit a short project proposal, and if we like it, you can use it as your core project for the apprenticeship.
Here are new project ideas:
- I could share some traffic statistics about 40,000 pages on DSC, and you work on the data to identify the types of articles and other metrics associated with success (and how do you measure success in the first place?), such as identifying great content for our audience, forecasting articles' lifetime and pageviews based on subject line or category, assessing impact of re-tweets, likes, and sharing on traffic, and detecting factors impacting Google organic traffic. Also, designing a tool to identify new trends and hot keywords would be useful. Lot's of NLP - natural language processing - involved in this type of project; it might also require crawling our websites.
- Another potential project is the creation of a redirect URL shortener like http://bit.ly, but one that correctly counts the number of clicks. Bit.ly (and also the Google URL shorterner) provides statistics that are totally wrong for traffic originating from email clients (e.g. Outlook, which represents our largest traffic source). Their numbers are inflated by more than 300%. It's possible that an easy solution consists of counting and reporting the number of users/visitors (after filtering out robots), rather than pageviews. Test your URL re-director and make sure only real human beings are counted (not robots or fake traffic).
- Other project: create a list of top 500 data scientists or big data experts using public data such as Twitter, and rate them based on number of followers or better criteria (also identify new stars and trends - note that new stars have fewer followers even though they might be more popular, as it takes time to build a list of followers). Classify top practitioners into a number of categories (unsupervised clustering) based on their expertise (identified by keywords or hashtags in their postings). Filter out automated from real tweets - in short identify genuine tweets posted by the author rather than feeds automatically blended with the author's tweets (you can try with my account @AnalyticBridge, which is a blend of external RSS feeds with my own tweets - some posted automatically, some manually). Create groups of data scientists. I started a similar analysis a while back, click here for details.
Finally, the questionnaire described in step 4 might be published online (dozens of questions), and you will have to select 5 questions (3 technical, 2 non-technical) in the list. Initially, I mentioned that I would email questions (randomly selected) separately to each candidate.
I am hopeful that we can get many candidates started this Monday, May 5. Priority will be given to candidates with strong analytic background, located in countries that contribute to our revenue (US, UK, Canada, Spain, Germany, Australia, Singapore, etc.)