It's a complete tutorial on data wrangling or manipulation with R. This tutorial covers one of the most powerful R package for data wrangling i.e. dplyr. This package was written by the most popular R programmer Hadley Wickham who has written many useful R packages such as ggplot2, tidyr etc. It's one of the most popular R package as of date. This post includes several examples and tips of how to use dply package for cleaning and transforming data.…Continue
Added by Deepanshu Bhalla on February 6, 2017 at 8:00am — No Comments
Today’s customers are socially driven and more value conscious than they were ever before. Believe it or not, everyday customer interactions create a whopping 2.5 exabytes of data, which is equal to 1,000,000 terabytes, and this figure has been predicted to grow by 40 percent with every passing year. As organisations face the…Continue
Added by Ronald van Loon on February 6, 2017 at 8:00am — No Comments
As we all know CRISP DM stands for Cross Industry Standard Process for Data Mining is a process model that outlines the most common approach to tackle data driven problems. Per the poll conducted by KDNuggets in 2014 this was and “is” one of the most popular and widest used methodology. This method of gleaning insights out of the data is very dear to the industry experts and data miners.
As the title suggest I will align some of the most useful R packages with this most popular and…Continue
Target corporation’s massively profitable data science project threw them into the news spotlight a few years back. Their story makes for a valuable case study in bridging data science and business intuition.
After having painstakingly developed a ‘golden-goose’ analytic model that could flag pregnant shoppers based on seemingly normal purchase patterns,…Continue
Added by David Stephenson on February 5, 2017 at 10:30pm — No Comments
Here is a nice summary of traditional machine learning methods, from Mathworks.
I also decided to add the following picture below, as it illustrates a method that was very popular 30 years ago but that seems to have been forgotten recently: mixture of Gaussian. In the example below, it is…Continue
Six Sigma is a quantitative approach to problem solving - to solve certain types of problems. At the root of Six Sigma is an improvement methodology that can be described by the acronym DMAIC: define, measure, analyze, improve, and control . Those interested in reading up on Six Sigma might consider the book for dummies, which I found fairly succinct. Those wondering what I mean by "certain types of problems" should consider how to apply the approach to their own business circumstances. I…Continue
Added by John Mount on February 4, 2017 at 3:30pm — No Comments
Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week.
Added by Vincent Granville on February 4, 2017 at 11:30am — No Comments
Guest blog post by by Brian Back.
From the wide range of things you can do with D3, still one of the best things to make is the timeseries plot. In this post, I’ll walk through the basics of making a multi-column point plot/scatter plot. We’ll use a GISS dataset from NASA; dataset can be found …Continue
Added by Vincent Granville on February 4, 2017 at 11:00am — No Comments
In very simple terms, Business model is how you plan to make money from your business.
A refined version is how you create and deliver value to customers. Your strategy tells you where you want to go and the business model tells you how you are going to do it.
In this time of industry 4.0 with Digital Transformation, businesses are getting disrupted faster than they get established. We all know what Apple did for music, Uber did for taxis and Airbnb did for…
Added by Sandeep Raut on February 4, 2017 at 7:00am — No Comments
“Half the money I spend on advertising is wasted; the trouble is I don't know which half.”
– John Wanamaker
The sale of a house is a valuable event for many parties. Real estate brokers, mortgage originators, moving companies – these businesses and more would greatly benefit from being able to get out in front of their competitors in…Continue
Businesses today need to do more than merely acknowledge big data. They need to embrace data and analytics and make them an integral part of their company. Of course, this will require building a quality team of data scientists to handle the data and analytics for the company. Choosing the right members for the team can be difficult,…
Added by Ronald van Loon on February 3, 2017 at 6:00am — No Comments
The development of artificial intelligence (AI) has had a huge influence on today’s society, as ongoing discussions evaluate the impacts of creating machines and computer systems that can react and perform like humans. These systems can process information in a more cognitive way, making them capable of more human-like functions like learning, decision-making, and visual perception.…Continue
Linear Regression is one of the most widely used statistical models. If Y is a continuous variable i.e. can take decimal values, and is expected to have linear relation with X's variables, this relation could be modeled as linear regression, mostly the first model to fit,if we are planning to develop a model of forecasting Y or trying to build hypothesis about relation Xs on Y.
Added by Jishnu Bhattacharya on February 1, 2017 at 8:30pm — No Comments
Most of the articles on extreme events are focusing on the extreme values. Very little has been written about the arrival times of these events. This article fills the gap.
We are interested here in the distribution of arrival times of successive records in a time series, with potential applications to global warming assessment, sport analytics, or high frequency trading. The purpose here is to discover what the distribution of these arrival times is, in absence of any…Continue
This cheat sheet, along with explanations, was first published on DataCamp. Click on the picture to zoom in. To view other cheat sheets (Python, R, Machine Learning, Probability, Visualizations, Deep Learning, Data Science, and so on) click here.
To view a…Continue
Added by Emmanuelle Rieuf on February 1, 2017 at 10:00am — No Comments
Journey Science, being derived from connected data from different customer activities, has become pivotal for the telecommunications industry, providing the means to drastically improve the customer experience and retention. It has the ability to link together scattered pieces of data, and enhance a telco business’s objectives. Siloed…
According to Experian, when it comes to data inaccuracy, much of it is down to human error, in particular, spelling mistakes. The reason for this lies in an over-reliance on manual data entry and the lack of…Continue
Added by Martin Doyle on February 1, 2017 at 5:30am — No Comments