Subscribe to DSC Newsletter

All Blog Posts (3,358)

The Fundamental Statistics Theorem Revisited

In this article, we revisit the most fundamental statistics theorem, talking in layman terms. We investigate a special but interesting and useful case, that is not discussed in textbooks, data camps, or data science classes. This article is part of a series about off-the-beaten-path data science and mathematics, offering a fresh, original and simple perspective on a number of topics. Previous articles in this series can be found…


Added by Vincent Granville on December 5, 2016 at 6:30pm — No Comments

Warning! Get Ready for Regulations that Restrict Your Analytics

Summary:  AI and predictive analytics are now so prevalent in our day-to-day lives that it has raised the attention of the government, particularly about how certain groups might be adversely impacted.  As data scientists we naturally want to have free reign to let the data speak as best it can, but as this report from the White House shows, we need to be prepared for some push back.


The last 10 years have been glorious days for data science and…


Added by William Vorhies on December 5, 2016 at 4:08pm — No Comments

Weekly Digest, December 5

Monday newsletter published by Data Science Central. Previous editions can be found here.  The contribution flagged with a + is our selection for the picture of the week.

Featured Resources and Technical Contributions


Added by Vincent Granville on December 3, 2016 at 11:00am — No Comments

Statistical Mistakes and How to Avoid Them

This article was posted by Adrian Sampson on his own blog. Adrian is an assistant professor in the Department of Computer Science at Cornell University, where here is also part of the Computer Systems Laboratory.

Computer scientists in systemsy fields, myself included, aren’t great at using statistics. Maybe it’s because there are so many other potential…


Added by Emmanuelle Rieuf on December 2, 2016 at 7:00pm — No Comments

Massive Internet Attack Floods the World with Fake Data

Reddit is now at the center of this attack that impacts millions of top domains (most of the Internet) since November 30. While Reddit appears at first glance as the perpetrator, it is actually the victim. This "behind the scene" scheme run from Russia generates huge amounts of fake traffic - as much as 10% of the entire Internet traffic.

It is not caught by Google Analytics, and thus it results in phony web traffic statistic and flawed reports, which is the main issue people are…


Added by Vincent Granville on December 2, 2016 at 12:00pm — 2 Comments

Year in Review: Deep Learning Breakthoughts 2016

Today we are featuring the year’s most interesting breakthroughs in deep learning that we have been fawning over at Grakn Labs. (For those of you who are interested in a crash course in deep learning, here’s a great video by Andrew Ng at Stanford.)…


Added by Precy Kwan on December 2, 2016 at 3:00am — No Comments

Stacking models for improved predictions: A case study for housing prices

This blog was originally published on my website.

If you have ever competed in a Kaggle competition, you are probably familiar with the use of combining different predictive models for improved accuracy which will creep your score up in the leader board. While it is widely used, there are only a few resources that I am aware of where a clear description is available (One that I know of is …


Added by Burak Himmetoglu on December 1, 2016 at 6:00pm — No Comments

Analysis of 2 Million Hijacked Passwords (in Python)

Posted by Jianhua Li on GitHub. This was proposed as a data science project on Data Science Central, to challenge your data science skills on a real data set. Below is an overview. 

Basically one should try to answer the following three questions:

  • What are the most common patterns found in passwords?
  • Based on these…

Added by Emmanuelle Rieuf on November 30, 2016 at 7:00pm — 1 Comment

Solving the Data Science Mystery

Solving the Data Science Mystery

Data Science has become an inevitable charter in our everyday lives where every action of ours is measured, plotted, classified and logged. We leave traces of who we are while diving a car, when visiting a place, after watching a movie or shopping what we want. These traces of data captured…


Added by Prakash Pasupathy on November 30, 2016 at 11:00am — No Comments

5 Business Models That Suit the Startups!

If there is an idea or a concept that the people around you are begging to turn into a business, why not go for it? Not only will you be doing what you love to do, but you will also be bringing in the green. A business model is an absolute must for any startup. But which one will you go…


Added by katey martin on November 29, 2016 at 10:30pm — No Comments

R for SQListas (2): Forecasting the Future

R for SQListas, part 2

Welcome to part 2 of my “R for SQListas” series. Last time, it was all about how to get started with R if you’re a SQL girl (or guy)- and that basically meant an introduction to Hadley Wickham’s dplyr and the tidyverse. The logic being: Don’t fear, it’s not that different from what…


Added by Sigrid Keydana on November 29, 2016 at 11:30am — No Comments

Has AI Gone Too Far? - Automated Inference of Criminality Using Face Images

Summary:  This new study claims to be able to identify criminals based on their facial characteristics.  Even if the data science is good has AI pushed too far into areas of societal taboos?  This isn’t the first time data science has been restricted in favor of social goals, but this study may be a trip wire that starts a long and difficult discussion about the role of AI.


Has AI gone too far? This might seem like a nonsensical question to data…


Added by William Vorhies on November 29, 2016 at 10:28am — No Comments

App Development: Which Type of App Should I Get?

With more and more people browsing online from smartphones and tablets, it's no longer a question of whether one needs a mobile application for their business or E-Commerce site, but rather how to get it developed.

There are a few different options available for your  mobile app development project, depending on the budget, target demographic and other factors, all outlined below. In short, there are 4 main routes to…


Added by Mark Pedersen on November 29, 2016 at 3:00am — No Comments

Why so many Machine Learning Implementations Fail?

A recent article in Techcrunch describes Twitter and Facebook issues: algorithms unable to detect fake news or hate speech. I wrote about how machine learning could be improved, and what can make implementations under-perform -…


Added by Vincent Granville on November 28, 2016 at 7:30pm — 1 Comment

Difference Between Data Scientists, Data Engineers, and Software Engineers - According To LinkedIn

This article was posted by Ryan Swanstrom on Data Science 101. Ryan is helping the world learn data science at Microsoft.

The differences between Data Scientists, Data Engineers, and Software engineers can get a little confusing at times. Thus, here is a guest post provided by Jake Stein, CEO at Stitch formerly RJ Metrics, which aims to clear up some of that confusion based upon LinkedIn data.

As data grows, so does the expertise needed to manage it. The past few years…


Added by Emmanuelle Rieuf on November 28, 2016 at 6:00pm — No Comments

Your CRM data should reveal your future success (or demise)

Guest blog by Chris Rigatuso. Chris is Founder and Board Member at the Skyfollow Consulting Group. He earned his MBA from the Haas School of Business (UC Berkeley), and lives in the Bay Area. …


Added by Emmanuelle Rieuf on November 28, 2016 at 5:30pm — No Comments

Machine learning as a service ? Might lose sleep over this !

    This post is 'not' intended to teach people how to use popular predictive modelling APIs for free. Although, to your surprise, this isn't a far fetched possibility. Trained Machine learning models are basically a function that maps feature vectors to the output variable. Upon querying with a test…


Added by Kumar Ashish on November 28, 2016 at 5:00pm — No Comments

13 Great Blogs Posted in the last 12 Months

This is part of a new series of articles: once or twice a month, we post previous articles that were very popular when first published. These articles are at least 6 month old but no more than 12 month old. The previous digest in this series was posted here a while back. 

13 Great Blogs Posted in the last 12 Months…


Added by Vincent Granville on November 28, 2016 at 2:00pm — No Comments

Why Oxytocin, Dophamine & Adrenalin are key to creating engaging Data Products ?

Human behaviors, rituals & habits are the outcome of complex interplay of the environment and experiences they have been exposed to. These definitely play a big role in shaping our product interaction experience. All of us have intuitively understood the importance of "cognitive resonance" in the first 8 seconds we interact with a product and how that experience has subsequently shaped our outlook to our product. As…


Added by derick.jose on November 27, 2016 at 10:00pm — 1 Comment

Product recommendations in Digital Age

By 1994 the web has come to our doors bringing the power of online world at our doorsteps. Suddenly there was a way to buy things directly and efficiently online.

Then came eBay and Amazon in 1995....... Amazon started as bookstore and eBay as marketplace for sale of goods.

Since then,…


Added by Sandeep Raut on November 27, 2016 at 2:00pm — No Comments

Monthly Archives








Follow Us


  • Add Videos
  • View All


© 2016   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service