September 2017 Blog Posts (80)

Getting a new periodic table of elements using AI

"Elementary particles are the building blocks of al matter everywhere in the universe.

Their properties are connected…


Added by Toni Manzano on September 9, 2017 at 6:00am — No Comments

10 Companies Using Machine Learning in Cool Ways

This article was written by Dan Shewan.

If science-fiction movies have taught us anything, it’s that the future is a bleak and terrifying dystopia ruled by murderous sentient robots.

Fortunately, only one of these things is true – but that could soon change, as…


Added by Amelia Matteson on September 8, 2017 at 4:00pm — No Comments

What Skills Do I Need to Become a Data Scientist?

Leveraging the use of big data, as an insight-generating engine, has driven the demand for data scientists at enterprise-level, across all industry verticals. Whether it is to refine the process of product development, help improve customer retention, or mine through the data to find new business opportunities—organizations are increasingly relying on the expertize of data…


Added by Ronald van Loon on September 8, 2017 at 9:30am — 1 Comment

Introduction to Blockchains & What It Means to Big Data

“Arguably the most significant development in information technology over the past few years, blockchain has the potential to change the way that the world approaches big data, with enhanced security and data quality just two of the benefits afforded to businesses using Satoshi Nakamoto’s landmark technology.”…


Added by Noah Data on September 8, 2017 at 3:00am — No Comments

18 Great Blogs Posted in the last 12 Months

This is part of a new series of articles: once or twice a month, we post previous articles that were very popular when first published. These articles are at least 6 month old but no more than 12 month old. The previous digest in this series was posted here a while back. 

18 Great Blogs Posted in the last 12…


Added by Vincent Granville on September 7, 2017 at 4:30pm — No Comments

Analyzing the relationship of Twitter users towards brands (e. g. Air Berlin)

Social media platforms such as Twitter and Facebook enable everyone to voice their opinions about topics, companies, and products online.

These comments are a great source for companies to analyze their customers’ opinion about their brand or product. However, with billions of Tweets and posts daily, this is can take a lot of time.

Unless of course, you use R J With just a few lines of R-code and the help of machine learning, we’re able to build mood monitoring tools quickly,…


Added by Daniel Schmeh on September 7, 2017 at 9:30am — 1 Comment

Book: Mastering Feature Engineering

Feature engineering is essential to applied machine learning, but using domain knowledge to strengthen your predictive models can be difficult and expensive. To help fill the information gap on feature engineering, this complete hands-on guide teaches beginning-to-intermediate data scientists how to work with this widely practiced but little discussed topic.



Added by Emmanuelle Rieuf on September 7, 2017 at 8:00am — No Comments

How I Detect Fake News

This article was written by Tim O'Reilly.…


Added by Amelia Matteson on September 6, 2017 at 2:30pm — 1 Comment

Naive Principal Component Analysis in R

Principal Component Analysis (PCA) is a technique used to find the core components that underlie different variables. It comes in very useful whenever doubts arise about the true origin of three or more variables. There are two main methods for performing a PCA: naive or less naive. In the naive method, you first check some conditions in your data which will determine the essentials of the analysis. In the less-naive method, you set the those yourself,…


Added by Pablo Bernabeu on September 6, 2017 at 1:30pm — No Comments

How to Train a Final Machine Learning Model

In this post, you discovered how to train a final machine learning model for operational use. You have overcome obstacles to finalizing your model, such as:

  • Understanding the goal of resampling procedures such as train-test splits and k-fold cross validation.
  • Model finalization as training a new model on all available data.
  • Separating the concern of estimating performance from finalizing the model.…


Added by Vincent Granville on September 6, 2017 at 7:01am — No Comments

14 Great Articles About Cross-Validation, Model Fitting and Selection

Cross-validation is a technique used to assess the accuracy of a predictive model, based on training set data. It splits the training sets into  test and control sets. The test sets are used to fine-tune the model to increase performance (better classification rate or reduced errors in prediction) and the control sets are used to simulate how the model would perform outside the training set. The control and test sets must be carefully chosen for this method to make…


Added by Vincent Granville on September 6, 2017 at 7:00am — No Comments

How artificial intelligence transforms business?

Artificial intelligence now fits in our daily lives and is deployed in more and more business sectors, hustling human expertise. Artificial intelligence should transform one job over two, but does not necessarily represent a threat. In fact, these jobs should be redirected to less repetitive tasks, with more added value.

 According to a PwC study from March 2017, 70% of the jobs in the energy sector and 65% of the jobs in the consumer sector could be…


Added by Valérie Burel on September 6, 2017 at 7:00am — No Comments

Why do Decision Trees Work?

This article is from Win-Vector LLC

In this article we will discuss the machine learning method called “decision trees”, moving quickly over the usual “how decision trees work” and spending time on “why decision trees work.” We will write from a computational learning theory perspective, and hope this helps make both decision trees and computational learning theory more comprehensible. The goal…


Added by Amelia Matteson on September 5, 2017 at 10:00am — No Comments

Dealing With Imbalanced Datasets

Summary:  Dealing with imbalanced datasets is an everyday problem.  SMOTE, Synthetic Minority Oversampling TEchnique and its variants are techniques for solving this problem through oversampling that have recently become a very popular way to improve model performance.


There are some problems that never go away. …


Added by William Vorhies on September 5, 2017 at 8:14am — 3 Comments

13,500 Nastygrams to Train Algorithms to Detect Undesirable Content

This article was written by Tom Simonite.

The nonprofit behind Wikipedia is teaming up with…


Added by Amelia Matteson on September 4, 2017 at 1:00pm — No Comments

Why You Need a (Big) Data Management Platform for Your Digital Transformation

Digital transformation is underway in practically every industry in the world. Companies, businesses and organizations throughout the world are leveraging their assets, big data and analytics for an edge over their competitors. In fact, data analytics and big data have gained popularity to the extent that data analysis for differentiation is…


Added by Ronald van Loon on September 3, 2017 at 11:30pm — No Comments

6 Easy Steps to Learn Naive Bayes Algorithm (with code in Python)

This article was posted by Sunil Ray. Sunil is a Business Analytics and BI professional.

Source for picture: click here


Here’s a situation you’ve got…


Added by Emmanuelle Rieuf on September 3, 2017 at 7:30am — No Comments

Weekly Digest, September 4

Monday newsletter published by Data Science Central. Previous editions can be found here.  The contribution flagged with a + is our selection for the picture of the week.

Featured Resources and Technical Contributions


Added by Vincent Granville on September 2, 2017 at 8:00am — No Comments

Distributed K-Means with R-Hadoop

In this article, an R-hadoop (with rmr2) implementation of Distributed KMeans Clustering will be described with a sample 2-d dataset.

  1. First the dataset shown below is horizontally partitioned into 4 data subsets and they are copied from local to HDFS, as shown in the following animation. The dataset chosen is small enough and it’s just for the POC purpose,…

Added by Sandipan Dey on September 1, 2017 at 11:30am — No Comments

Blog Topics by Tags

Monthly Archives













© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service