Here is a list of 5 machine learning open source projects from top internet companies.

To be in sync with Airbnb’s vision that enabling humans to partner with a machine in a symbiotic way to exceed the capabilities of humans and machines , its project AeroSolve focused on improving the understanding of data sets by assisting people in interpreting complex data with easy to understand models. Instead of hiding meaning beneath many layers of model complexity, Aerosolve models expose data to the light of understanding.

They are able to easily determine the negative correlation between the price of a listing in a market and the demand for the listing just by inspecting the image. Rather than passing features through many deep hidden layers of non-linear transforms They make models very wide, with each variable or combinations of variables modeled explicitly using additive functions. This makes the model easy to interpret while still maintaining a lot of capacity to learn.

FAIR open sources deep-learning modules for Torch

Facebook has open sourced optimized deep-learning modules for Torch. These modules are significantly faster than the default ones in Torch and have accelerated their research projects by allowing them to train larger neural nets in less time.

Recent release includes GPU-optimized modules for large convolutional nets (ConvNets), as well as networks with sparse activations that are commonly used in Natural Language Processing applications. Their ConvNet modules include a fast FFT-based convolutional layer using custom CUDA kernels built around NVIDIA's cuFFT library. For a deeper dive, have a look at this paper.

In addition to this module, the release includes a number of other CUDA-based modules and containers, including:

Containers that allow the user to parallelize the training on multiple GPUs using both the data-parallel model (mini-batch split over GPUs), or the model-parallel model (network split over multiple GPUs).

An optimized Lookup Table that is often used when learning embedding of discrete objects (e.g. words) and neural language models.

Hierarchical SoftMax module to speed up training over extremely large number of classes.

Cross-map pooling (sometimes known as MaxOut) often used for certain types of visual and text models.

A GPU implementation of 1-bit SGD based on the paper by Frank Seide, et al.

A significantly faster Temporal Convolution layer, which computes the 1-D convolution of an input with a kernel, typically used in ConvNets for speech recognition and natural language applications. Our version improves upon the original Torch implementation by utilizing the same BLAS primitives in a significantly more efficient regime. Observed speedups range from 3x to 10x on a single GPU, depending on the input sizes, kernel sizes, and strides.

TensorFlow is an open source software library for numerical computation using data flow graphs. TensorFlow was originally developed by researchers and engineers working on the Google Brain team within Google's Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research. The system is general enough to be applicable in a wide variety of other domains, as well.

In their architecture, nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets individual deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code.

All-pairs similarity via DIMSUM

Twitter had contributed DIMSUMv2 to the Spark and Scalding open-source projects.

Twitter has a constant need for finding users, hashtags and ads that are very similar to one another, so they may be recommended and shown to users and advertisers. To do this, they considered many pairs of items, and evaluate how “similar” they are to one another.

To solve this “all-pairs similarity” problem, Twitter has developed a new efficient algorithm called “Dimension Independent Matrix Square using MapReduce,” or DIMSUM for short, which made one of their most expensive computations 40% more efficient.

FeatureFu project contains set of libraries and tools which help in advanced feature engineering, such as using extended s-expression based feature transformation, to derive features on top of other features, or convert a light weighted model like logistical regression or decision tree into a feature, in an intuitive way without touching any code.

FeatureFu uses Expr to evaluate mathematical s-expressions written in Java

It can be used in Feature normalization, Feature combination, Nonlinear featurization and many other use cases.

This list is generated by Largol.

- Juniper adds Mist AIOps to its 128 Technology-based SD-WAN
- 10 microservices patterns all architects should know
- IBM extends Call for Code for Racial Justice program
- citizen development
- How to manage third-party risk in the supply chain
- Gartner predicts data storytelling will dominate BI by 2025
- AWS Data Exchange and the third-party cloud data marketplace
- Overcome common IoT edge computing architecture issues

Posted 1 March 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central