To some, the term “business intelligence” may seem like a hilarious oxymoron, but the term is actually not used to describe how smart a business or its owner is. In...
Guest blog post by Dmitry Petrov. Originally posted here. There is a feature I really like in Apache Spark. Spark can process data out of memory in my local machine e...
Sometimes a correlation means absolutely nothing, and is purely accidental (especially when you compute millions of correlations among thousands of variables) or it can ...
Julia is a high-level dynamic programming language designed to address the requirements of high-performance numerical and scientific computing. It has been discussed as o...
By Greta Roberts When beginning a new predictive analytics project, the client often mentions the importance of a “quick win”. It makes sense to think about deliver...
Unlike most other lists of top experts, this one is a hand-picked selection, not based on influence or Klout scores, or the number of Twitter followers and re-tweets, or...
This post is a summary of 3 different posts about outlier detection methods. One of the challenges in data analysis in general and predictive modeling in particular is ...
Today we are featuring the year’s most interesting breakthroughs in deep learning that we have been fawning over at Grakn Labs. (For those of you who are interested in ...
This blog was originally published on my website. If you have ever competed in a Kaggle competition, you are probably familiar with the use of combining different predi...
Python, R and SAS are the three most popular languages in data science. If you are new to the world of data science and aren’t experienced in either of these languages...