Subscribe to DSC Newsletter

               Enterprise applications trending to adopt Machine Learning as their strategic implementation and performing machine learning deep analytics across multiple problem statements is becoming a common trend. There are variety of machine learning solutions / packages / platform that exist in market. One of the main challenges that the teams initially trying to resolve is to choose the correct platform / package for their solution.

                Based on my limited experience with different machine learning solutions I thought to write this blog to list out the points (features in machine learning term) to consider while choosing a specific ML platform and list pros and cons of each of the solutions in market.

Let’s look at feature set to be weighted before deciding a ML solution

High Level Feature

Feature Set

Comments

Data Storage

High Storage Volume Need

Ability to store huge volume of data to add ever growing storage needs

High Availability

High availability of data on partial failures

Data Exploration

Visualizing summary tables and patterns in input data

Ability to find patterns in input data

Data Preparation / Cleansing

Feature Extraction

Manipulating the raw data to extract features needed for algorithm execution

Distributed Execution

Ability to perform the data manipulation in a distributed way , this is required when you have huge volume of data and need to reduce the time to complete

Development

Supported Languages

Scripting languages support for development

Ease of development

How easy is the platform to develop scripts and execution?

General Purpose Programming

General purpose programming needs and ability of the underlying support for general purpose programming?

Model

Algorithms Supported

Availability of different algorithm implementation

packages on the platform

Distributed Execution in Model Creation

Model creation is a time consuming operation and hence the ability to create the model in a distributed way saves lot of time

Deep Learning Support

Support for Deep learning algorithms

GPU Support

GPU execution support

Flexibility to Tune Model

Level at which the mode parameters can be tuned

Model Examination Flexibility

Ability to examine the model helps to deep dive into

what is happening behind the model

Ease in switching between Models

Switch between different models for suitable choice

Data Visualization

Visualize and Plot the results

Availability of different charts to visualize the output

Productionizing

Ease of deploying the model in production use case on web environment

Run in large scale deployment

Ability to deploy the model in web

Scale to huge volume of data handling

Support

Official / Community Support with Active development

Commercial support availability for the platform / solution

Active community development

 

 

                   Now let’s look at the different machine learning solutions / platforms available in the market and where they stand with respect address the feature requirements.

Solution

Language

Pros

Cons

RStudio

R

Thousands of packages for different solutions

Easy to develop

Deep Model examination and tuning

Time consuming execution due to single threaded nature.

Not easy productionizing for  web environment

Spark ML

Scala, Python, R

Scalable Machine learning library

Distributed execution utilizing platform like Yarn , Mesos etc.

Faster execution

Supports multiple languages like Scala, Python, R

 

New to market

Does not have exhaust list of algorithm implementation

Knowledge of Hadoop eco system

H20

Scala, Python, R

Easy integration to platforms like Spark through Sparkling water , R

Connect to data from hdfs, S3, NOSQL db etc...

 

Compatibility between H20 and Spark with Sparkling water

No support for scala in H20 Notebooks

 

Tensorflow

Python, C++

Flexible architecture that can deployed to run CPU / GPU

Effective utilization of underlying hardware.

Stronger in Deep Learning implementations

Learning Curve is comparatively more

Generally meant for Neural network based implementation

Matlab

Matlab

Advanced tool box with wide variety of algorithm implementations

Algorithms can be deployed as Java or dot net packages for deployment

Learning of Matlab language

Expensive product

 

Anaconda

Python

Good collection of algorithm implementations

Easy to learn and develop

Integration with PYSPARK

Good for local usage and trials

Enterprise license cost

Advanced features is licensed and expensive

 

Turi

Python

SFrame concept aims for distributed machine learning executions

Can read and process from HDFS, S3 etc.

Simplified machine learning executions

 

Commercial licensed product

 

IBM Watson

PaaS for ML

PaaS platform for Machine Learning on IBM Blue Mix

Easy integration with social, cloud

End to end solution development with limited knowledge

Easy to deploy

Limited control in model creation & tuning

Limited control over underlying infrastructure

 

Azure ML

PaaS for ML

PaaS platform for Machine Learning on Microsoft Azure

Workflow based ML solution on Azure

Easy to develop ML solutions on Azure cloud  

 

Limited control in model creation & tuning

 

Limited control over underlying infrastructure

AWS ML

SaaS for ML

PaaS platform for Machine Learning on AWS  

Easy to develop ML solutions on AWS cloud  

 

Limited control in model creation & tuning

 

Limited control over underlying infrastructure

 

                   Machine learning packaged solutions like RStudio, H20, Anaconda, Turi are trying to improve in the space of accessing and storing data on distributed storage and trying to add capabilities for distributed multi thread / core /node execution on time consuming tasks like data preparation, feature extraction  and model creation.

                   Machine learning PaaS solutions like IBM Watson, Azure ML, AWS ML having benefits of cloud background tries to abstract the overhead of packaging and aims for easy deployment and scalability. These PaaS solutions are limited with fine tuning models and algorithms trying to improve in that space. 

Views: 3029

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Alexandre P. Kuzin on October 7, 2016 at 10:06am

It is a good guide in the ML sea.

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service