Subscribe to DSC Newsletter

Featured Blog Posts – September 2019 Archive (61)

The problem that Google Solved with Quantum Supremacy

The problem has to do with sampling, random numbers and probability distributions, so it is of interest to our community. As Scott Aaronson describes it in his blog, here is the problem:

You can read more here, including answers to…


Added by Capri Granville on September 22, 2019 at 1:30pm — No Comments

The Math of Machine Learning - Berkeley University Textbook

This document is an attempt to provide a summary of the mathematical background needed for an introductory class in machine learning, which at UC Berkeley is known as CS 189/289A.

Our assumption is that the reader is already familiar with the basic concepts of multivariable calculus and linear algebra (at the level of UCB Math 53/54). We emphasize that this document is not a replacement for the prerequisite classes. Most subjects presented here are covered rather minimally; we intend…


Added by Capri Granville on September 22, 2019 at 11:30am — No Comments

Weekly Digest, September 23

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link.  



Added by Vincent Granville on September 22, 2019 at 11:30am — No Comments

Correlation Coefficients in One Picture

Correlation coefficients enable to you find relationships between a wide variety of data. However, the sheer number of options can be overwhelming. This picture sums up the differences between five of the most popular correlation coefficients.…


Added by Stephanie Glen on September 22, 2019 at 6:00am — No Comments

The simplest explanation of machine learning you’ll ever read

This article was written by Cassie Kozyrkov.


You’ve probably heard of machine learning and artificial intelligence, but are you sure you know what they are? If you’re struggling to make sense of them, you’re not alone. There’s a lot of buzz that makes it hard to tell what’s science and what’s science fiction. Starting with the names…


Added by Andrea Manero-Bastin on September 21, 2019 at 9:00am — No Comments

AWK -- a Blast from Wrangling Past.

I recently came across an interesting account by a practical data scientist on how to munge 25 TB of data. What caught my eye at first was the article's title: "Using AWK and R to parse 25tb". I'm a big R user now and made a living with AWK 30 years ago as a budding data analyst. I also empathized with the author's recountings of…


Added by steve miller on September 21, 2019 at 5:30am — 2 Comments

Explaining Logistic Regression as Generalized Linear Model (in use as a classifier)

The explanation of Logistic Regression as a Generalized Linear Model and use as a classifier is often confusing.

In this article, I try to explain this idea from first principles. This blog is part of my forthcoming book on the Mathematical foundations of Data Science. If you are interested in knowing more, please follow me on linkedin Ajit Jaokar

We take the following approach:

  • We see first briefly how…

Added by ajit jaokar on September 20, 2019 at 11:36am — No Comments

Applications of Data Analytics

I am presenting at the upcoming NISS (National Institute of Statistical Sciences) webinar on September 27. This was my first employer in US, back in 1996. I was then completing a post-doc.

My presentation focuses on new algorithms, original applications, theoretical data science (including a new conjecture about data sets) and implications to business analytics, as well as new foundations of statistics, based on general resampling and model free, data-driven techniques. It will also…


Added by Vincent Granville on September 19, 2019 at 9:30am — No Comments

How AI/ML Could Return Manufacturing Prowess Back to US

I grew up in a small manufacturing town in Northeast Iowa.  The factory in my hometown made tractors (no surprise given that it was Iowa), but eventually the economics of cheap foreign labor and an interconnected global economy caught up with that factory – as it did with many US-based manufacturers – and soon the factory closed, and many people were laid off.

But the technology world continues to evolve – especially with respect to IoT, Data Science and AI/ML – and so…


Added by Bill Schmarzo on September 19, 2019 at 8:45am — No Comments

Introduction to Authorship Analysis as a Text Classification/Clustering Problem

Guest blog post by Nabanita Roy.


The art and science of discriminating between writing styles of authors by identifying the characteristics of the persona of the authors and examining articles authored by them is called Authorship Analysis. It aims to determine characteristics of an individual like age, gender, native language and personality traits…


Added by Vincent Granville on September 18, 2019 at 3:02pm — No Comments

MS Data Science vs MS Machine Learning / AI vs MS Analytics

Q1. Is MSc in Data science/Data Analytics same as ML/AI as some universities don’t have AI but Data Science?

Q2. I am interested in MS Data Science and not MS Analytics as the later is not technical in nature. Are MS Data Science and MS Data Analytics the same?

Q3. How to Choose Between a Master’s in Data Analytics vs Business…


Added by Tanmoy Ray on September 17, 2019 at 9:30am — No Comments

Boosting your Machine Learning productivity with SAS Viya

By Stefan Stoyanov, Business Analytics & Research Intern at Boemska

I started my MSc Business Analytics course at theh University of Surrey almost one year ago. I had no prior experience in Machine Learning or data science. Before, I used to develop and manage EU projects…


Added by Stefan Stoyanov on September 17, 2019 at 4:30am — No Comments

Water Data Provides Ground-Level Insight into Business Risk

In May 2014, Milwaukee experienced 82 water main breaks in five days, sending thousands of people scrambling for water and costing the city hundreds of thousands of dollars in infrastructure repair and property damage. 

The series of breaks – although…


Added by Lewis Wynne-Jones on September 16, 2019 at 5:30am — No Comments

IoT - where are the stream analytics use cases?

I have been looking at this problem over a few years now
The IoT industry often speaks of handling both high volumes and high throughputs of data
However, currently, I find that there are not many use cases for IoT streaming analytics which are unique
The 'unique' and 'currently' bits…

Added by ajit jaokar on September 14, 2019 at 8:45am — No Comments

Weekly Digest, September 16

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. To subscribe, follow this link.  


  • Building…

Added by Vincent Granville on September 14, 2019 at 8:30am — No Comments

Difference Between Stratified Sampling, Cluster Sampling, and Quota Sampling

What is the Difference Between Stratified Sampling and Cluster Sampling?

The main difference between stratified sampling and cluster sampling is that with cluster sampling, you have natural groups separating your population. For example, you might be able to divide your data into natural groupings like city blocks, voting districts or school…


Added by Stephanie Glen on September 14, 2019 at 5:00am — No Comments

The Data Lake Chronicles: Pitching Through Pain, Vampire Indecisions and Second Surgeries

There is a phrase in baseball about pitchers “pitching through pain” that refers to pitchers taking the mound to pitch even though they have aches and pains – sore arms, stiff joints, blisters, strained muscles, etc. The idea is that these pitchers are so tough that they can pitch effectively even though they are not quite physically right. 

However, when the human system is asked to do…


Added by Bill Schmarzo on September 13, 2019 at 7:35am — No Comments

Calculating Price Elasticity Through KNIME

As per Wikipedia, Price Elasticity of Demand (PED or ED or PE) is a measure used in economics to show the responsiveness, or change, of the quantity demanded of a good or service to a change in its price when nothing but the price changes. In more precise business terms, it helps in finding those products which have their sales more/less susceptible to price changes. As we know, the demand is inversely proportional to price, it is quite imperative to know this information for…


Added by saurabh ajmera on September 13, 2019 at 5:41am — No Comments

The 2019 Guide to Conquer Data Breaching

How secure is your data?

What measures do you take to hide your confidential data?

Can you confidently say that data breaching stands nowhere close to your security?

Data breaching is not new and neither will it disappear.

As technology rises with new developments,…


Added by Racheal Chapman on September 12, 2019 at 11:30pm — No Comments

10 Visualizations Every Data Scientist Should Know

Ancient ruins are sometimes discovered after long years investigating regions of the world covered by dense jungle or giant forests. The feeling of an archaeologist at that moment of discovery gives a window into the feeling data scientists often have when getting a view of their data — through visualizations — that clarifies a key aspect of the analysis.…


Added by Jorge Castanon on September 12, 2019 at 12:00pm — No Comments

Featured Monthly Archives












  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service