R language is the world's most widely used programming language for statistical analysis, predictive modeling and data science. It's popularity is claimed in many recent surveys and studies. R programming language is getting powerful day by day as number of supported packages grows. Some of big IT companies such as Microsoft and IBM have also started developing packages on R and offering enterprise version of R.
Added by Deepanshu Bhalla on June 12, 2017 at 12:30am — No Comments
This article explains how to select important variables using boruta package in R. Variable Selection is an important step in a predictive modeling project. It is also called 'Feature Selection'. Every private and public agency has started tracking data and collecting information of various attributes. It results to access to too many predictors for a predictive model. But not every variable is important for prediction of a particular task. Hence it is essential to…Continue
It’s not easy for a retailer to face ongoing economic challenges. The power of the customers is rising. They have the right to choose the best, and they are not happy with anything less. The competition in every single industry is brutal. We’re not exaggerating when we say that every business battles for survival.
In this war of competitors, data analytics are the most effective weapon. In February 2017, JDA Software Group and PwC (PricewaterhouseCoopers) released…
Added by Robert Morris on May 29, 2017 at 5:30am — No Comments
Summary: Someone had to say it. In my opinion R is not the best way to learn data science and not the best way to practice it either. More and more large employers agree.
“If (there) was one thing all people took for granted, (it) was conviction that if you feed honest figures into a computer, honest figures (will) come out. Never doubted it myself till I met a computer with a sense of humor.”
― Robert A. Heinlein, The Moon is a Harsh Mistress
This post is the first in a series of articles in which we will explain what Machine Learning is. You don’t have to have formal training or…Continue
(Photo credit: Rob Lavinsky, iRocks.com – CC-BY-SA-3.0)
In 1945, Count ,Richard Taaffe* a Dublin gem collector, was sorting through a set of spinel gems that he had bought, and found one…Continue
Added by Peter Bruce on March 30, 2017 at 2:30pm — No Comments
This is a project I've been working on for some time to help improve the missed opportunity rate (no-show rate) at all medical centers. It demonstrates how to extract datasets from an SQL server and load them directly into an R environment. It also demonstrates the entire machine learning process, from engineering new features, tuning and training the model, and finally measuring the model's performance. I would like to share my results and methodology as a guide to help…Continue
Added by James Marquez, MBA, PMP on March 21, 2017 at 8:30am — No Comments
Summary: Count yourself lucky if you’re not in one of the regulated industries where regulation requires you to value interpretability over accuracy. This has been a serious financial weight on the economy but innovations in Deep Learning point a way out.
As Data Scientists we tend to take as gospel that more accuracy is better. There…Continue
Added by William Vorhies on February 28, 2017 at 9:21am — No Comments
In this post, we consider different approaches for time series modeling. The forecasting approaches using linear models, ARIMA alpgorithm, XGBoost machine learning algorithm are described. Results of different model combinations are shown. For probabilistic modeling the approaches using copulas and Bayesian inference are considered.
Time series analysis, especially forecasting, is an important problem of modern…Continue
Accurate multichannel campaign attribution has stumped the online marketing industry for years. But what if the solution is to stop worrying about attribution, and move to an optimization-driven approach?
You know those photo mosaic images, which suddenly became terribly popular a few years back? They cleverly use lots of individual tiny images to make up one large image. If you look closely you can make out the…Continue
Added by Ian Thomas on January 27, 2017 at 9:30am — No Comments
R is a free programming language for data analysis, statistical modeling and visualization. It is one of the most popular tool in predictive modeling world. Its popularity is getting better day by day. In 2016 data science salary survey conducted by O'Reilly, R was ranked second in a category of programming languages for data science (SQL ranked first). In another popular KDnuggets Analytics software survey poll, R scored top rank with 49% vote. These survey polls answers the question about…Continue
Added by Deepanshu Bhalla on January 1, 2017 at 9:30am — No Comments
At Grakn Labs we love technology. Here is our December 15th edition by Filipe Pinto Teixeira, where he looked back at Predictive Analytics.
A popular phrase tossed around when we talk about statistical data is “there is correlation between variables”. However, many people wrongly consider this to be the equivalent of “there is causation between variables”. It’s important to explain the distinction: Correlation means that once we know how one variable changes we can make reasonable deductions about how other variables change There are several variants of correlation:
Data analytics is a mature discipline at this point, and even those outside the data science world generally understand what it’s all about. Modern data science, however, is still new enough to spur questions. Vincent Glanville, Executive Data Scientist at Data Science Central, spoke with Roy Wilds, Chief Data Scientist from PHEMI, a Vancouver-based big data startup, about the best way to educate people…Continue
Added by Roy Wilds, PhD, PHEMI Systems on December 6, 2016 at 8:00am — No Comments
One of the most typical tasks in machine learning is classification tasks. It may seem that evaluating the effectiveness of such a model is easy. Let’s assume that we have a model which, based on historical data, calculates if a client will pay back credit obligations. We evaluate 100 bank customers and our model correctly guesses in 93 instances. That may appear to be a good result – but is it really? Should we consider a model with 93% accuracy as adequate?
It depends. Today, we…Continue
Added by Algolytics on November 13, 2016 at 4:30am — No Comments
In the previous post of our Understanding machine learning series, we presented how machines learn through multiple experiences. We also explained how, in some cases, human beings are much better at interpreting data than machines. In many tasks machines still can’t replace humans, who understand surrounding reality better and can make more accurate decisions.
Machines can be given a…Continue
Added by Algolytics on October 13, 2016 at 4:30am — No Comments
Here is our new selection of articles and resources featured today, pertaining to data science and related fields.
Added by Vincent Granville on October 12, 2016 at 7:02pm — No Comments
What is Data Mining?
Data mining is an integrated application in the Data Warehouse and describes a systematic process for pattern recognition in large data sets to identify conclusions and relationships. Using statistical methods, or genetic algorithms, data files can be automatically searched for statistical anomalies, patterns or rules.
Wikipedia defines Data Mining as “Data mining is an interdisciplinary subfield of computer science. It is the computational process…Continue
In a prior post I outlined some thoughts on the outlook for the data analytics sector and referenced a database I prepared of analytics companies. At the time the list comprised about 400 names categorized into a number of sectors and segments.
I’ve continued to update the list since that time and it now comprises about 800 companies.
This article on going deeper into regression analysis with assumptions, plots & solutions, was posted by Manish Saraswat. Manish who works in marketing and Data Science at Analytics Vidhya believes that education can change this world. R, Data Science and Machine Learning keep him busy.
Regression analysis marks the first step in predictive modeling. No doubt, it’s fairly easy to implement. Neither it’s syntax nor its parameters create any kind of confusion. But,…Continue
Added by Emmanuelle Rieuf on September 8, 2016 at 12:00pm — No Comments