Subscribe to DSC Newsletter

5 Machine Learning Open Source Projects From Top Internet Companies

Here is a list of 5 machine learning open source projects from top internet companies.


To be in sync with Airbnb’s vision that enabling humans to partner with a machine in a symbiotic way to exceed the capabilities of humans and machines , its project AeroSolve focused on improving the understanding of data sets by assisting people in interpreting complex data with easy to understand models. Instead of hiding meaning beneath many layers of model complexity, Aerosolve models expose data to the light of understanding.

They are able to easily determine the negative correlation between the price of a listing in a market and the demand for the listing just by inspecting the image. Rather than passing features through many deep hidden layers of non-linear transforms They make models very wide, with each variable or combinations of variables modeled explicitly using additive functions. This makes the model easy to interpret while still maintaining a lot of capacity to learn.

FAIR open sources deep-learning modules for Torch

Facebook has open sourced optimized deep-learning modules for Torch. These modules are significantly faster than the default ones in Torch and have accelerated their research projects by allowing them to train larger neural nets in less time.

Recent release includes GPU-optimized modules for large convolutional nets (ConvNets), as well as networks with sparse activations that are commonly used in Natural Language Processing applications. Their ConvNet modules include a fast FFT-based convolutional layer using custom CUDA kernels built around NVIDIA's cuFFT library. For a deeper dive, have a look at this paper.

In addition to this module, the release includes a number of other CUDA-based modules and containers, including:

  • Containers that allow the user to parallelize the training on multiple GPUs using both the data-parallel model (mini-batch split over GPUs), or the model-parallel model (network split over multiple GPUs).

  • An optimized Lookup Table that is often used when learning embedding of discrete objects (e.g. words) and neural language models.

  • Hierarchical SoftMax module to speed up training over extremely large number of classes.

  • Cross-map pooling (sometimes known as MaxOut) often used for certain types of visual and text models.

  • A GPU implementation of 1-bit SGD based on the paper by Frank Seide, et al.

  • A significantly faster Temporal Convolution layer, which computes the 1-D convolution of an input with a kernel, typically used in ConvNets for speech recognition and natural language applications. Our version improves upon the original Torch implementation by utilizing the same BLAS primitives in a significantly more efficient regime. Observed speedups range from 3x to 10x on a single GPU, depending on the input sizes, kernel sizes, and strides.

TensorFlow Library

TensorFlow is an open source software library for numerical computation using data flow graphs. TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

In their architecture, nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets individual deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code.

All-pairs similarity via DIMSUM

Twitter had contributed DIMSUMv2 to the Spark and Scalding open-source projects.

Twitter has a constant need for finding users, hashtags and ads that are very similar to one another, so they may be recommended and shown to users and advertisers. To do this, they considered many pairs of items, and evaluate how “similar” they are to one another.

To solve this “all-pairs similarity” problem,  Twitter has developed a new efficient algorithm called “Dimension Independent Matrix Square using MapReduce,” or DIMSUM for short, which made one of their most expensive computations 40% more efficient.


FeatureFu project contains set of libraries and tools which help in advanced feature engineering, such as using extended s-expression based feature transformation, to derive features on top of other features, or convert a light weighted model like logistical regression or decision tree into a feature, in an intuitive way without touching any code.

FeatureFu uses Expr to evaluate mathematical s-expressions written in Java

It can be used in Feature normalization, Feature combination, Nonlinear featurization and many other use cases.

This list is generated by Largol.

Views: 6711


You need to be a member of Data Science Central to add comments!

Join Data Science Central

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service