Added by Ashish kumar on April 10, 2016 at 10:24am — No Comments
Some of the rarely shared trade secrets in machine learning: Original post: on linkedin
1. Bootstrap sampling & the magic number 0.63
Even though randomly sampled,…
Added by Ashish kumar on April 1, 2016 at 9:00am — No Comments
"Abstract Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and…Continue
AirBnB New User Bookings was a popular recruiting competition that challenged Kagglers to predict the first country where a new user would book travel. This was the first recruiting competition on Kaggle with scripts enabled. AirBnB…Continue
Added by Diego Marinho de Oliveira on March 10, 2016 at 2:30am — No Comments
One of the hot topics on Machine Learning is, with no doubts, feature engineering. In fact, it comes before the buzz on this topic, simple when we talk about Data Mining. Remembering the CRISP-DM process, feature engineering (and, consequently, feature selection) is the core of a great data mining project – it comes to life on the Data Preparation phase, that is the task to have constructive data preparation operations such as the production of derived attributes or entire new records, or…Continue
Added by Leandro Guerra on February 15, 2016 at 12:30am — No Comments
UPDATE: Mar 20, 2016 - Added my new follow-up course on Deep Learning, which covers ways to speed up and improve vanilla backpropagation: momentum and Nesterov momentum, adaptive learning rate algorithms like AdaGrad and RMSProp, utilizing the GPU on AWS EC2, and stochastic batch gradient descent. We look at TensorFlow and Theano starting from the basics - variables, functions, expressions, and simple optimizations - from there, building a neural network seems simple! …Continue
This is no big surprise as all the past reports have pointed towards this growth and expansion -…Continue
Added by Bruce Robbins on January 3, 2016 at 5:00am — No Comments
Last week witnessed a number of exciting announcements from the big data and machine learning space. What it shows is that there are still lots of problems to solve in 1) working with/deriving insights from big data, 2) integrating insights into business processes.
Probably the biggest (data) headline was that Google open sourced TensorFlow, their graph-based…Continue
Added by Brian Rowe on November 17, 2015 at 6:02am — No Comments
Unsupervised learning algorithms are machine learning algorithms that work without a desired output label. A supervised machine learning algorithm typically learns a function that maps an input x into an output y, while an unsupervised learning algorithm simply analyzes the x’s without requiring the y’s. Essentially, the algorithm attempts to estimate the underlying structure of the population of x’s (in other…Continue
Added by Aureus Analytics on November 16, 2015 at 10:00pm — No Comments
Added by Neuza Nunes on October 23, 2015 at 11:57am — No Comments
Added by Demnag on September 13, 2015 at 8:07pm — No Comments
Neural networks require considerable time and computational firepower to train. Previously, researchers believed that neural networks were costly to train because gradient descent slows down near local minima or saddle points. At the RE.WORK Deep…Continue
Added by Sophie Curtis on September 3, 2015 at 8:59am — No Comments
Hello and Welcome back!
This series is my attempt to start cataloging all the interesting articles, industry reports, whitepapers, and news that I read every month, related to technology and data science. We are at Month 2 and let us dig right in -
This essay titled "…Continue
Added by Srividya Kannan Ramachandran on August 17, 2015 at 5:30am — No Comments
Added by Vozag on August 6, 2015 at 9:30pm — No Comments
Alan Turing was the first one to present the idea of simulating the machine thinking. Its been more than 60 years since the ground breaking paper of Alan Turing came out, The Imitation Game. The world has changed rapidly since then.
The machines of today have become so powerful. They can actually think, which endorses the idea of Alan Turing presented in 50s. However, the machine thinking may be different. Alan Turing argued, just because the thinking can be…Continue
We’ve created a Domino project with starter code in R and Python for participating in the Data Science Bowl.
Get a jump start in the competition with our starter project by training your models on massive hardware and running multiple experiments in parallel while keeping track of…Continue
Added by Anna Anisin on January 13, 2015 at 3:00pm — No Comments
We all know that calculating error bounds on metrics derived from very large data sets has been problematic for a number of reasons. In more traditional statistics one can put a confidence interval or error bound on most metrics (e.g., mean), parameters (e.g., slope in a regression), or classifications (e.g., confusion matrix and the Kappa statistic).
For many machine learning applications, an error bound could be very important.…Continue
Added by Anna Anisin on December 14, 2014 at 3:33pm — No Comments
When you use Twitter, how do you know when you are being presented with something credible instead of something totally bogus? The answer is, unless you spend a lot of time researching each tweet, you probably don’t. However, one thing is for certain, we rely on what we read on Twitter to be true.
Twitter is one of the fastest and most effective ways we disseminate news across our world. If this…Continue
Added by Renette Youssef on December 8, 2014 at 4:00pm — No Comments
This blog is extrapolated from DataScience Hacks by the author himself.
Apache Spark, another apache licensed top-level project that could perform large scale data processing way faster than Hadoop (I am referring to MR1.0 here). It is possible due to Resilient Distributed Datasets concept that is behind this fast data processing. RDD is basically a collection of objects,…Continue