For python programmers, scikit-learn is one of the best libraries to build Machine Learning applications with. It is ideal for beginners because it has a really simple interface, it is well documented with many examples and tutorials.
Besides supervised machine learning (classification and regression), it can also be used for clustering, dimensionality reduction, feature extraction and engineering, and pre-processing the data. The interface is consistent over all of these methods, so it is not only easy to use, but it is also easy to construct a large ensemble of classifiers/regression models and train them with the same commands.
In this blog lets have a look at how to build, train, evaluate and validate a classifier with scikit-learn and in this way get familiar with the scikit-learn library.
The glass dataset contains data on six types of glass (from building windows, containers, tableware, headlamps, etc) and each type of glass can be identified by the content of several minerals (for example Na, Fe, K, etc). This dataset only contains numerical data and therefore is a good dataset to get started with.
The second dataset contains non-numerical data and we will need an additional step where we encode the categorical data to numerical data.