Subscribe to DSC Newsletter

Andrea Manero-Bastin's Blog (25)

Unsolved Problems in Machine Learning

Quora contribution written by Chomba Bupe.

I am actually not even aware of any machine learning (ML) problem that is considered to have been solved recently or in the past. This tells you a lot about how hard things really are in ML. Of course, if you read media outlets, it may seem like researchers are sweeping the floor clean with deep learning (DL), solving ML problems one…

Continue

Added by Andrea Manero-Bastin on April 21, 2019 at 6:00am — No Comments

Astonishing Hierarchy of Machine Learning Needs

This article was written by V Sharma.

Astonishing Hierarchy of Machine Learning Needs – Artificial intelligence and machine learning are used interchangeably often but for they are not the same. Machine learning is one of the most active areas and a way to achieve AI. Why ML is so good today; for this, there are a couple of reasons. Machine Learning entirely depend upon…

Continue

Added by Andrea Manero-Bastin on April 10, 2019 at 2:30am — No Comments

Robust Regressions: Dealing with Outliers

This article was written by Michael Grogan.

It is often the case that a dataset contains significant outliers – or observations that are significantly out of range from the majority of other observations in our dataset. Let us see how we can use robust regressions to deal with this issue.

I described in…

Continue

Added by Andrea Manero-Bastin on April 4, 2019 at 9:30am — No Comments

Five Industries Where Blockchain Has Innovated Beyond Cryptocurrency

This article was written by Patricia Jones.

The advent of the digital age has led to several innovations which will play a huge role in the future of international society, and one of these blockchain technology. The…

Continue

Added by Andrea Manero-Bastin on April 4, 2019 at 9:00am — No Comments

Advanced cross-validation tips for time series

This article was written by Datapred.

 

In a previous post, we explained the concept of cross-validation for time series, aka backtesting, and why proper backtests matter for time series modeling.

The goal here is to dig deeper and discuss a few coding tips that will help you cross-validate your predictive models correctly.…

Continue

Added by Andrea Manero-Bastin on March 28, 2019 at 10:04am — No Comments

Efficient Tuning of Online Systems Using Bayesian Optimization

This article, written by the Facebook research team, was written by Ben Letham, Brian Karrer, Guilherme Ottoni and Eytan Bakshy.…

Continue

Added by Andrea Manero-Bastin on March 19, 2019 at 9:30am — No Comments

Demystifying the Math of Support Vector Machines (SVM)

This article was written by Krishna Kumar Mahto.

 So, three days into SVM, I was 40% frustrated, 30% restless, 20% irritated and 100% inefficient in terms of getting my work done. I was stuck with the Maths part of Support Vector Machine. I went through a number of YouTube videos, a number of documents, PPTs and PDFs of lecture notes, but…

Continue

Added by Andrea Manero-Bastin on March 14, 2019 at 6:30am — No Comments

How to Configure the Number of Layers and Nodes in a Neural Network

This article was written by Jason Brownlee

Artificial neural networks have two main hyperparameters that control the architecture or topology of the network: the number of layers and the number of nodes in each hidden layer. You must specify values for these parameters when configuring your network. The most reliable way to configure…

Continue

Added by Andrea Manero-Bastin on February 17, 2019 at 2:00am — No Comments

Data Science Jargon Explained to the Layman

This article was written by Enda Ridge.

Data Scientists need to communicate without jargon so customers understand, believe and care about their recommendations. Here is a Data Science jargon buster to help.

Data Science is a technical…

Continue

Added by Andrea Manero-Bastin on February 17, 2019 at 1:30am — No Comments

What Are Machine Learning Models Hiding?

This article was written by Vitaly Shmatikov.

 Machine learning is eating the world. The abundance of training data has helped ML achieve amazing results for object recognition, natural language processing, predictive analytics, and all manner of other tasks. Much of this training data is very sensitive, including personal photos, search queries,…

Continue

Added by Andrea Manero-Bastin on February 17, 2019 at 1:30am — No Comments

Cross-Validation: Concept and Example in R

This article was written by Sondos Atwi.

In Machine Learning, Cross-validation is a resampling method used for model evaluation to avoid testing a model on the same dataset on which it was trained. This is a common mistake, especially that a separate testing dataset is not always available. However, this usually leads to inaccurate performance measures (as the model will have an almost perfect…

Continue

Added by Andrea Manero-Bastin on February 10, 2019 at 10:30am — No Comments

Cross-Validation: Concept and Example in R

This article was written by Sondos Atwi.

What is Cross-Validation?

In Machine Learning, Cross-validation is a resampling method used for model evaluation to avoid testing a model on the same dataset on which it was trained. This is a common mistake, especially that a separate…

Continue

Added by Andrea Manero-Bastin on January 28, 2019 at 11:30pm — No Comments

Universal Method to Sort Complex Information Found

This article was written by Kevin Hartnett.

The nearest neighbor problem asks where a new point fits into an existing data set. A few researchers set out to prove that there was no universal way to solve it. Instead, they found such a way.…

Continue

Added by Andrea Manero-Bastin on January 15, 2019 at 9:00am — No Comments

The Math Required for Machine Learning

This article was written by Harsh Sikka. This version is a summary of the original article.



Start with  …

Continue

Added by Andrea Manero-Bastin on January 9, 2019 at 7:30am — No Comments

The 10 Statistical Techniques Data Scientists Need to Master

This article was written by James Le. Here is a brief summary. Link to the full article is provided at the bottom. Some techniques are not mentioned in Le's article, for instance neural networks, K-NN, density estimation, time series models, survival analysis, Markov chains, Bayesian statistics, graph models, and spatial processes. However his article is a great read, with the 10 topics explained in details,…

Continue

Added by Andrea Manero-Bastin on January 9, 2019 at 7:30am — No Comments

What is a Generative Adversarial Network?

This article was written by Hunter Heidenreich.

Looking into what a generative adversarial network is to understand how they work.

 

What’s…

Continue

Added by Andrea Manero-Bastin on January 2, 2019 at 9:30am — No Comments

How to Visualize a Decision Tree from a Random Forest in Python using Scikit-Learn

This article was written by Will Koehrsen.

Here’s the complete code: just copy and paste into a Jupyter Notebook or Python script, replace with your data and run:

The final result is a complete decision tree as…

Continue

Added by Andrea Manero-Bastin on December 22, 2018 at 7:30am — 1 Comment

7 Classical Assumptions of Ordinary Least Squares (OLS) Linear Regression

This article was written by Jim Frost. Here we present a summary, with link to the original article.

Ordinary Least Squares (OLS) is the most common estimation method for linear models—and that’s true for a good reason. As long as your model satisfies the OLS assumptions for linear…

Continue

Added by Andrea Manero-Bastin on December 13, 2018 at 5:00pm — No Comments

The Startup Founder’s Guide to Analytics

This article was written by Tristan Handy.

This post is about how to create the analytics competency at your organization. It’s not about what metrics to track (there are plenty of good posts about that), it’s about how to actually get your business to produce them. As it turns out, the implementation question  - How do I build a…

Continue

Added by Andrea Manero-Bastin on November 27, 2018 at 9:00pm — No Comments

Seven Techniques for Data Dimensionality Reduction

This article comes from the blog of the website KNIME. Below is a summary. The highest reduction ratio without performance degradation is obtained by analyzing the decision cuts in many random forests (Random Forests/Ensemble Trees). However, even just counting the number of missing values, measuring the column variance, and measuring the correlation of pairs of columns can lead to a satisfactory reduction rate…

Continue

Added by Andrea Manero-Bastin on November 17, 2018 at 8:30pm — 1 Comment

Videos

  • Add Videos
  • View All

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service