Home » Uncategorized

Comparing classification algorithms: pluses and minuses

What are the advantages of different classification algorithms?

For instance, if we have large training data set with approx more than 10,000 instances and more than 100,000 features, then which classifier will be best to choose for classification?


Xavier Amatriain, PhD in CS, former Professor and coder has answered the question:

There are a number of dimensions you can look at to give you a sense of what will be a reasonable algorithm to start with, namely:

  • Number of training examples
  • Dimensionality of the feature space
  • Do I expect the problem to be linearly separable?
  • Are features independent?
  • Are features expected to linearly dependent with the target variable? 
  • Is overfitting expected to be a problem?
  • What are the system’s requirement in terms of speed/performance/memory usage..?

This list may seem a bit daunting because there are many issues that are not straightforward to answer. The good news though is, that as many problems in life, you can address this question by following the Occam’s Razor principle: use the least complicated algorithm that can address your needs and only go for something more complicated if strictly necessary.

– Logistic Regression

– Support Vector Machines 

– Tree Ensembles

– Deep Learning

To read the full article (posted as a Quora question, including 22 answers), click here. For more articles on classification, click here.  

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge