Subscribe to DSC Newsletter

Featured Blog Posts – September 2017 Archive (72)

How I Detect Fake News

This article was written by Tim O'Reilly.…


Added by Amelia Matteson on September 6, 2017 at 2:30pm — 1 Comment

Naive Principal Component Analysis in R

Principal Component Analysis (PCA) is a technique used to find the core components that underlie different variables. It comes in very useful whenever doubts arise about the true origin of three or more variables. There are two main methods for performing a PCA: naive or less naive. In the naive method, you first check some conditions in your data which will determine the essentials of the analysis. In the less-naive method, you set the those yourself,…


Added by Pablo Bernabeu on September 6, 2017 at 1:30pm — No Comments

How to Train a Final Machine Learning Model

In this post, you discovered how to train a final machine learning model for operational use. You have overcome obstacles to finalizing your model, such as:

  • Understanding the goal of resampling procedures such as train-test splits and k-fold cross validation.
  • Model finalization as training a new model on all available data.
  • Separating the concern of estimating performance from finalizing the model.…


Added by Vincent Granville on September 6, 2017 at 7:01am — No Comments

14 Great Articles About Cross-Validation, Model Fitting and Selection

Cross-validation is a technique used to assess the accuracy of a predictive model, based on training set data. It splits the training sets into  test and control sets. The test sets are used to fine-tune the model to increase performance (better classification rate or reduced errors in prediction) and the control sets are used to simulate how the model would perform outside the training set. The control and test sets must be carefully chosen for this method to make…


Added by Vincent Granville on September 6, 2017 at 7:00am — No Comments

How artificial intelligence transforms business?

Artificial intelligence now fits in our daily lives and is deployed in more and more business sectors, hustling human expertise. Artificial intelligence should transform one job over two, but does not necessarily represent a threat. In fact, these jobs should be redirected to less repetitive tasks, with more added value.

 According to a PwC study from March 2017, 70% of the jobs in the energy sector and 65% of the jobs in the consumer sector could be…


Added by Valérie Burel on September 6, 2017 at 7:00am — No Comments

Why do Decision Trees Work?

This article is from Win-Vector LLC

In this article we will discuss the machine learning method called “decision trees”, moving quickly over the usual “how decision trees work” and spending time on “why decision trees work.” We will write from a computational learning theory perspective, and hope this helps make both decision trees and computational learning theory more comprehensible. The goal…


Added by Amelia Matteson on September 5, 2017 at 10:00am — No Comments

Dealing With Imbalanced Datasets

Summary:  Dealing with imbalanced datasets is an everyday problem.  SMOTE, Synthetic Minority Oversampling TEchnique and its variants are techniques for solving this problem through oversampling that have recently become a very popular way to improve model performance.


There are some problems that never go away. …


Added by William Vorhies on September 5, 2017 at 8:14am — 3 Comments

13,500 Nastygrams to Train Algorithms to Detect Undesirable Content

This article was written by Tom Simonite.

The nonprofit behind Wikipedia is teaming up with…


Added by Amelia Matteson on September 4, 2017 at 1:00pm — No Comments

Why You Need a (Big) Data Management Platform for Your Digital Transformation

Digital transformation is underway in practically every industry in the world. Companies, businesses and organizations throughout the world are leveraging their assets, big data and analytics for an edge over their competitors. In fact, data analytics and big data have gained popularity to the extent that data analysis for differentiation is…


Added by Ronald van Loon on September 3, 2017 at 11:30pm — No Comments

6 Easy Steps to Learn Naive Bayes Algorithm (with code in Python)

This article was posted by Sunil Ray. Sunil is a Business Analytics and BI professional.

Source for picture: click here


Here’s a situation you’ve got…


Added by Emmanuelle Rieuf on September 3, 2017 at 7:30am — No Comments

Weekly Digest, September 4

Monday newsletter published by Data Science Central. Previous editions can be found here.  The contribution flagged with a + is our selection for the picture of the week.

Featured Resources and Technical Contributions


Added by Vincent Granville on September 2, 2017 at 8:00am — No Comments

Distributed K-Means with R-Hadoop

In this article, an R-hadoop (with rmr2) implementation of Distributed KMeans Clustering will be described with a sample 2-d dataset.

  1. First the dataset shown below is horizontally partitioned into 4 data subsets and they are copied from local to HDFS, as shown in the following animation. The dataset chosen is small enough and it’s just for the POC purpose,…

Added by Sandipan Dey on September 1, 2017 at 11:30am — No Comments

Featured Monthly Archives












© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service