Subscribe to DSC Newsletter

How to Choose a Machine Learning Model – Some Guidelines

 

 

In this post, we explore some broad guidelines for selecting machine learning models

 

The overall steps for Machine Learning/Deep Learning are:

  • Collect data
  • Check for anomalies, missing data and clean the data
  • Perform statistical analysis and initial visualization
  • Build models
  • Check the accuracy
  • Present the results

 

Machine learning tasks can be classified into

  • Supervised learning
  • Unsupervised learning
  • Semi-supervised learning
  • Reinforcement learning

 

PS – in this document – we do not focus on the last two

 

Below are some approaches on choosing a model for Machine Learning/Deep Learning

 

OVERALL APPROACHES

 

  • Dealing with unbalanced data: Use resampling strategies            

  • Create new features : Principal component analysis (PCA) to reduce dimensionality, Autoencoders to create a latent space and possibly Clustering to create new features

  • To prevent overfitting, outliers and noise in linear regression - use regularization techniques like lasso and ridge.

 

MACHINE LEARNING MODELS

  • First approach to predicting continuous values: Linear Regression is generally a good first approach for predicting continuous values (ex: prices)

  • Binary classification: Logistic regression is a good starting point for Binary classification. Support Vector Machines SVM is also a good choice of two class classification

  • Is there a simplest or easiest model category to start off with? Decision trees are often seen as simple to understand and use. Decision trees are implemented through models such as Random forest or Gradient boosting.

  • Which models are used in Kaggle? For supervised learning: Random forest and XGboost See note on Gradient boosted trees

 

DEEP LEARNING MODELS

  • Complex features which cannot be easily specified but you have large number of labelled examples: Multi-layer perceptrons

  • Vision based Machine Learning: Image classification, Object Detection, Image segmentation – Convolutional Neural Networks

  • Sequence modelling tasks: RNNs (typically LSTM) for sequence modelling tasks ex text classification or language translation

 

Comments welcome

 

Image source: BMJ – what makes machine learning in healthcare so powerful

 

Views: 4496

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by sei k on October 21, 2018 at 2:23pm

Very informative article, thank you!

Comment by ajit jaokar on October 19, 2018 at 11:20am

thanks @Nelson

Comment by Nelson Kemboi Yego on October 18, 2018 at 9:27pm

Simple and clear

Comment by ajit jaokar on October 18, 2018 at 8:24pm

@Ramiro @Indu thanks

@Mark - thanks yes .. am planning to :) but got tied up with teaching(@Oxford) which starts soon https://www.conted.ox.ac.uk/courses/data-science-for-the-internet-o... - so its on the cards 

Comment by Mark on October 18, 2018 at 11:29am

Great post.how about adding two other sectiond one for outlier detectoon and feature selection (not extraction)

Comment by Ramiro Arce on October 18, 2018 at 9:53am

Nice overview for beginners, like me :D

Thanks.

Comment by Indu Shahi on October 18, 2018 at 9:39am

Tips for ML and data science folks

Videos

  • Add Videos
  • View All

Follow Us

© 2018   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service