Home » Technical Topics » Machine Learning

Scikit-learn Classification Algorithms

This article was written by Matthew Mayo.

Scikit-learn is the de facto official machine learning library in use in the Python ecosystem. As described on its official website, Scikit-learn is:

  • Simple and efficient tools for data mining and data analysis
  • Accessible to everybody, and reusable in various contexts
  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable – BSD license 

plot_classifier_comparisonThis tutorial is meant to serve as a demonstration of several machine learning classifiers, and { is inspired by | references | incoporates techniques from } the following excellent works:

  • Randal Olson’s An Example Machine Learning Notebook
  • Analytics Vidhya’s Common Machine Learning Algorithms Cheat Sheet
  • Scikit-learn’s official Cross-validation Documentation
  • Scikit-learn’s official Iris Dataset Documentation
  • Likely includes influence of the various referenced tutorials included in this KDnuggets Python Machine Learning article I recently wrote

We will use the well-known Iris and Digits datasets to build models with the following machine learning classification algorithms:

  • Logistic Regression
  • Decision Tree
  • Support Vector Machine
  • Naive Bayes
  • k-nearest Neighbors
  • Random Forests

We also use different strategies for evaluating models:

  • Separate testing and training datasets
  • k-fold Cross-validation

Some simple data investigation methods and tools will be undertaken as well, including:

  • Plotting data with Matplotlib
  • Building and data via Pandas dataframes
  • Constructing and operating on multi-dimensional arrays and matrices with Numpy

This tutorial is brief, non-verbose, and to the point. Please alert me if you find inaccuracies. Also, if you find it at all useful, and believe it to be worth doing so, please feel free to share it far and wide.

To read the tutorial, with the demonstration, click here.