Here we describe a simple methodology to produce predictive scores that are consistent over time and compatible across various clients, to allow for meaningful comparisons and consistency in actions resulting from these scores, such as offering a loan. Scores are used in various contexts, such as web page rankings in search engines, credit score, risk score attached to loans or credit card transactions, the risk that someone might become a terrorist, and more. Typically a score is a function…Continue
Added by Vincent Granville on February 18, 2019 at 9:30pm — No Comments
Summary: True prescriptive analytics requires the use of real optimization techniques that very few applications actually use. Here’s a refresher on optimization with examples of where and how they’re best used.
It all depends on the classes that you attended. Some are worth listing, some are best not to mention. Here I review of few of these data science curricula, and the impression it can make on hiring managers, depending on your profile, work experience, and strength (or lack of) of these programs.
In practice, the Data Scientist wants to know which formula they will write in their Excel sheet when they enter all the data available into it: Bayes’ or usual?
The answer is that it depends: if all the data is well…Continue
Added by Marcia Ricci Pinheiro on February 18, 2019 at 4:50am — No Comments
Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link.
Added by Vincent Granville on February 17, 2019 at 9:30am — No Comments
Grab a copy of The Elements of Statistical Learning ("the machine learning bible") and you might be a little overwhelmed by the mathematics. For example, this equation (p.34), for a cubic smoothing spline, might send shivers down your spine if math isn't your forte:…Continue
This article was written by Jason Brownlee.
Artificial neural networks have two main hyperparameters that control the architecture or topology of the network: the number of layers and the number of nodes in each hidden layer. You must specify values for these parameters when configuring your network. The most reliable way to configure…Continue
Added by Andrea Manero-Bastin on February 17, 2019 at 2:00am — No Comments
This article was written by Enda Ridge.
Data Scientists need to communicate without jargon so customers understand, believe and care about their recommendations. Here is a Data Science jargon buster to help.
Data Science is a technical…Continue
Added by Andrea Manero-Bastin on February 17, 2019 at 1:30am — No Comments
This article was written by Vitaly Shmatikov.
Machine learning is eating the world. The abundance of training data has helped ML achieve amazing results for object recognition, natural language processing, predictive analytics, and all manner of other tasks. Much of this training data is very sensitive, including personal photos, search queries,…Continue
Added by Andrea Manero-Bastin on February 17, 2019 at 1:30am — No Comments
Bayesian Probability is like a reaction to the Mathematical Probability: what about our…Continue
Added by Marcia Ricci Pinheiro on February 16, 2019 at 11:30pm — No Comments
It wasn’t too long ago when somebody said to me, “You do reports when you get to doing them.” To me, this position is most defensible if the reports are for bookkeeping purposes. I pointed out one day that my reports are for management purposes; and for this reason timeliness is important. For instance, when one is driving a car, and it is necessary to turn at the next right, turning at the next right five lights later is fairly relevant. Timing counts. The “active” process of driving…Continue
Added by Don Philip Faithful on February 16, 2019 at 11:33am — No Comments
There are several technology and business forces in-play that are going to derive and drive new sources of customer, product and operational value. As a set up for this blog on the Economic Value of Data Science, let’s review some of those driving forces.
Added by Bill Schmarzo on February 16, 2019 at 5:32am — No Comments
Previously, I tackled the Gambler's Ruin problem using conditional probability and difference equations as well as visualising the simulations of the problem in a random walk style using Python/Pygame. This can be found here: …Continue
Added by Tansel Arif on February 15, 2019 at 9:52am — No Comments
If there was an AI winter, we are clearly in the peak of its summer. I do not know if we will ever build something as Skynet, but we are going to build much simpler things that will change the course of our lives. This shows a new application of analytics in the field of finance. A radical new approach that let flourish a new face of finance never seen…Continue
Added by Ramon Serrallonga on February 15, 2019 at 6:06am — No Comments
During the most recent decade, the force originating from both the scholarly community and industry has lifted the R programming language. Also, they have worked hard to end up the absolute most significant tool for computational statistics, perception, and data science.
Due to the growth of R in the data science community, there is a constant need to upgrade and develop both R and…Continue
Added by Divya Singh on February 14, 2019 at 10:32pm — No Comments
In this 5 Minute Analysis we'll preprocess, map, and explore complicated sales data for liquor stores in Iowa. Then we’ll extract the relevant latitude and longitude from a problematic column of the data and discover the city with the most sales. Next we’ll filter the data to that city and prepare the data for easy loading into Business Analysis tools such as Tableau and PowerBI. Finally…
Added by Benjamin Waxer on February 14, 2019 at 9:32am — No Comments
This is the first article in what will be a three-part series:
"How to make your mark on the world as a talented, socially conscious data scientist."
Added by Marshall Lincoln on February 13, 2019 at 5:50pm — No Comments
Many of the following statistical tests are rarely discussed in textbooks or in college classes, much less in data camps. Yet they help answer a lot of different and interesting questions. I used most of them without even computing the underlying distribution under the null hypothesis, but instead, using simulations to check whether my assumptions were plausible or not. In short, my approach to statistical testing is model-free, data-driven. Some are easy to implement even in Excel. Some of…Continue
An Introduction to Bayesian Reasoning
You might be using Bayesian techniques in your data science without knowing it! And if you're not, then it could enhance the power of your analysis. This blog post, part 1 of 2, will demonstrate how Bayesians employ probability distributions to add information when fitting models, and reason about uncertainty of the model's fit.
Grab a coin. How fair is the coin? What is the probability…Continue
Professional athletes know the importance of developing opposing or complementary muscles (quadriceps and hamstrings, biceps and triceps). These complementary muscles are sets of muscles that “work together” to move your body in the most efficient ways. If these muscles are strengthened together, it creates a balance that can lead to optimal performance. However, if these muscles are not strengthened together, then one significantly increases the risk of…Continue
Added by Bill Schmarzo on February 13, 2019 at 4:44am — No Comments