Subscribe to DSC Newsletter

This was the subject of a question asked on Quora: What are the top 10 data mining or machine learning algorithms?

Some modern algorithms such as collaborative filtering, recommendation engine, segmentation, or attribution modeling, are missing from the lists below. Algorithms from graph theory (to find the shortest path in a graph, or to detect connected components), from operations research (the simplex, to optimize the supply chain), or from time series, are not listed either. And I could not find MCM (Markov Chain Monte Carlo) and related algorithms used to process hierarchical, spatio-temporal and other Bayesian models. What else in missing?

In 2006, the IEEE Conference on Data Mining identified the top 10 ML algorithms as

  1. C4.5 (Decision Trees)
  2. k-Means (clustering)
  3. Support Vector Machines (SVM)
  4. Apriori
  5. Expectation Maximization (EM)
  6. PageRank
  7. AdaBoost
  8. k-Nearest Neighbors (kNN)
  9. Naive Bayes
  10. Classification and Regression Tree (CART)

An answer to the Quora question, in 2011, lists the following as potential candidates or additions:

  1. Kernel Density Estimation and Non-parametric Bayes Classifier
  2. K-Means
  3. Kernel Principal Components Analysis
  4. Linear Regression
  5. Neighbors (Nearest, Farthest, Range, k, Classification)
  6. Non-Negative Matrix Factorization
  7. Support Vector Machines
  8. Dimensionality Reduction
  9. Fast Singular Value Decomposition
  10. Decision Tree
  11. Bootstapped SVM
  12. Decision Tree
  13. Gaussian Processes
  14. Logistic Regression
  15. Logit Boost
  16. Model Tree
  17. Naïve Bayes
  18. Nearest Neighbors
  19. PLS
  20. Random Forest
  21. Ridge Regression
  22. Support Vector Machine
  23. Classification: logistic regression, naïve bayes, SVM, decision tree
  24. Regression: multiple regression, SVM
  25. Attribute importance: MDL
  26. Anomaly detection: one-class SVM
  27. Clustering: k-means, orthogonal partitioning
  28. Association: A Priori
  29. Feature extraction: NNMF

And a 2015 answer provides the following:

  1. Linear regression
  2. Logistic regression
  3. k-means
  4. SVMs
  5. Random Forests
  6. Matrix Factorization/SVD
  7. Gradient Boosted Decision Trees/Machines
  8. Naive Bayes
  9. Artificial Neural Networks
  10. For the last one I'd let you pick one of the following:
  11. Bayesian Networks
  12. Elastic Nets
  13. Any other clustering algo besides k-means
  14. LDA
  15. Conditional Random Fields
  16. HDPs or other Bayesian non-parametric model

My point of view is of course biased, but I would like to also add some algorithms developed or re-developed at the Data Science Central's research lab:

  • Jackknife regression
  • Feature extraction / selection (mentioned above, but this version is very different)
  • Hidden decision trees
  • Indexation and tagging algorithms

These algorithms are described in the article What you wont learn in statistics classes.

Regarding the Indexation algorithms (see Part 2 after clicking on this link): This must be at least 20 years old. It is an incredibly fast clustering technique indeed: it does not require n x n memory storage, only n, where n is the number of observations. Also, it is easy to implement in distributed Map-Reduce or Hadoop environments. It is a fundamental algorithm: the core algorithm used to build taxonomies, catalogs (see this article about Amazon), search engines, and enterprise search solutions. DSC used it successfully in numerous contexts including for IoT automated growth hacking for digital publishing, to categorize articles and boost them depending (among other things) on category, for maximum efficiency. Here's another illustration

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 27420

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Dalila Benachenhou on February 22, 2016 at 7:18am

Self Organizing Map (missing)

Comment by Prof. Dr. Diego Kuonen on December 17, 2015 at 5:20am

More than 2/3 are from the statistics' community :)

Comment by Dermot Cochran on December 11, 2015 at 6:02am

The list of machine learning algorithms seems very long; is there any way to organize or categorize into different kinds?

Comment by Krishna Sankar on December 8, 2015 at 8:38am

Thanks. Excellent addition to the original question. I was the one who asked the question, and it still lives.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service