Subscribe to DSC Newsletter

10 Python Machine Learning Projects on GitHub

Here is a list of top Python Machine learning projects on GitHub. A continuously updated list of open source learning projects is available on Pansop.

 

scikit-learn

scikit-learn is a Python module for machine learning built on top of SciPy.It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Official source code repo:

NuPIC

The Numenta Platform for Intelligent Computing (NuPIC) is a machine intelligence platform that implements the HTM learning algorithms. HTM is a detailed computational theory of the neocortex. At the core of HTM are time-based continuous learning algorithms that store and recall spatial and temporal patterns. NuPIC is suited to a variety of problems, particularly anomaly detection and prediction of streaming data sources.

Pattern

Pattern is a web mining module for Python. It has tools for Data Mining, Natural Language Processing, Network Analysis and Machine Learning. It supports vector space model, clustering, classification using KNN, SVM, Perceptron

Pylearn2

Pylearn2 is a library designed to make machine learning research easy. Its a library based on Theano

Ramp

Ramp is a python library for rapid prototyping of machine learning solutions. It's a light-weight pandas-based machine learning framework pluggable with existing python machine learning and statistics tools (scikit-learn, rpy2, etc.). Ramp provides a simple, declarative syntax for exploring features, algorithms and transformations quickly and efficiently.

MILK

Milk is a machine learning toolkit in Python. Its focus is on supervised classification with several classifiers available: SVMs, k-NN, random forests, decision trees. It also performs feature selection. These classifiers can be combined in many ways to form different classification systems.For unsupervised learning, milk supports k-means clustering and affinity propagation.

skdata

Skdata is a library of data sets for machine learning and statistics. This module provides standardized Python access to toy problems as well as popular computer vision and natural language processing data sets.

mlxtend

It's a library consisting of useful tools and extensions for day-to-day data science tasks.

machine-learning-samples

A collection of sample applications built using Amazon Machine Learning.

REP

REP is environment for conducting data-driven research in a consistent and reproducible way. It has a unified classifiers wrapper for variety of implementations like TMVA, Sklearn, XGBoost, uBoost. It can train classifiers parallely on a cluster. It support of interactive plots


DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 35610

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Hassan Bashari on December 3, 2015 at 11:00pm

It seems the powerful "pybrain" library is omitted.It has comprehensive modules especially ANNs

Comment by Marzena Bihun on May 27, 2015 at 9:46pm

At university I was exposed to NLTK platform on Natural Language Processing course and they convinced us that this toolkit is the best for NLP. Never heard of Pattern project and I would be curious whether anyone used both, NLTK and Pattern? Are they comparable or, for certain tasks, is one of them superior to the other? Thanks.

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service