Data Science & Machine Learning Encyclopedia – 4,000 Entries

This is one of the first comprehensive machine learning, data science, statistical science, and computer science repository — featuring many brand new scalable, big-data algorithms published in the last two years, such as automated cataloging, causation detection, or model-free tests of hypotheses, in addition to the classics. The original title for this project was Handbook of Data Science, but over time, it grew much bigger than an handbook. This is still an ongoing project.


Time and budget permitting, we will turn it into an 10-volumes traditional encyclopedia, even available in print format. Contact us if you are interested in partnering with us as a blogger, scientific author, publisher, reviewer, editor, or sponsor. For now, here is an update about the progress on this initiative, and more specifically, what is currently available for you.

  • Many of the popular articles can be found in our weekly digests. Our digests are archived here
  • The search box in the top right corner on any web page, can be used to find specific documents on our network, and in many cases, great articles that you won’t find on Google. Here is a list of popular keywords (clickable links) to get you started.
  • An Excel spreadsheet featuring 4,000 articles with creation date, title, URL, and popularity. Popularity of an article is a score based on log(log(unique page views) because of the Zipf distribution of the raw page view counts: details are unimportant, what is important is that we turned it into a robust metric in the spreadsheet. The spreadsheet is available to our members only: click here, and check the last item in the bullet list. We have an older version available to non-members, but it is much smaller. It is worth reading though, as it explains how to adjust popularity based on creation date and decay over time, using survival models. We plan on cataloging our articles using our indexation  technique, and even predicting popularity of an article (when adjusted for creation time) using our Hidden Decision Tree technology. Indeed, you could do it yourself too: this is actually a great project for a data scientist in training, with all the data and technology available online from Data Science Central.
  • A selection of the best articles from our past weekly digests: click here. Articles from that list will be used to produce my upcoming book, data science 2.0. Just like we used prior articles to create our first book
  • A selection of great resources from outside our network: click here

Also, if you only have time to read a few seminal articles written in simple English, visit this page (Hitchhiker’s guide to data science, machine learning, R, and Python.)