Subscribe to DSC Newsletter

April 2017 Blog Posts (92)

A Robot Took My Job – Was It a Robot or AI?

Summary:  The argument in the popular press about robots taking our jobs fails in the most fundamental way to differentiate between robots and AI.  Here we try to identify how each contributes to job loss and what the future of AI Enhanced Robots means for employment. 

There’s been a lot of contradictory opinion in the press…


Added by William Vorhies on April 10, 2017 at 2:00pm — 1 Comment

The Startup Founder’s Guide to Analytics

This article was written by Tristan Handy. Tristan is the founder and president of Fishtown Analytics: helping startups implement advanced analytics.

I’m very confident of that, because today, everyone needs analytics. Not just product, not just marketing, not just finance… sales, fulfillment, everyone at a startup needs analytics today.…


Added by Emmanuelle Rieuf on April 10, 2017 at 11:00am — No Comments

Market Shifts in Data Integration Technology

It wouldn’t take a genius to notice the evolution of modern technology. In just the past ten years, we’ve watched the flip-phone transform into the smartphone and the automobile inch towards autonomy. Within our own space, we’ve noticed similar shifts, as the resurgent process of data virtualization is growing in importance within the larger data integration market. To further understand the nature of this shift, we take to Gartner:

As data…


Added by Amy Flippant on April 10, 2017 at 10:30am — No Comments

PySpark Cheat Sheet: Spark in Python

Apache Spark is generally known as a fast, general and open-source engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing. It allows you to speed analytic applications up to 100 times faster compared to technologies on the market today. You can interface Spark with Python through "PySpark". This is the Spark Python API exposes the Spark programming model to Python. 

The cheat sheet below was produced by…


Added by Vincent Granville on April 10, 2017 at 9:00am — No Comments

Feature Engineering with Tidyverse

In this blog post, I will discuss feature engineering using the Tidyverse collection of libraries. Feature engineering is crucial for a variety of reasons, and it requires some care to produce any useful outcome. In this post, I will consider a dataset that contains description of crimes in San Francisco between…


Added by Burak Himmetoglu on April 10, 2017 at 7:30am — No Comments

Implement an ARIMA model using statsmodels (Python)

In this article was written by Michael Grogan. Michael is a data scientist and statistician, with a profound passion for statistics and programming.

In a previous tutorial, I elaborated on how an ARIMA model can be implemented using R. The model was fitted on a stock price dataset, with a (0,1,0) configuration being used for ARIMA.

Here, I detail how to implement an ARIMA model in Python using the…


Added by Emmanuelle Rieuf on April 9, 2017 at 11:00am — No Comments

The Future of Data Science in One Picture

We have published various "one picture" articles about data science topics. Our readers find them very useful, as they convey the same amount of information in one visual, as a long article. 

The picture below is from the Data Science Field Guide published by Booz Allen Hamilton. You can download the guide…


Added by Vincent Granville on April 9, 2017 at 10:00am — No Comments

Spectral Attenuation Monitor

About a month ago in a blog, I introduced what I described as a “spectral attenuation monitor.”  At the time I only had an image from MS Works that…


Added by Don Philip Faithful on April 9, 2017 at 6:30am — No Comments

Weekly Digest, April 10

Monday newsletter published by Data Science Central. Previous editions can be found here.  The contribution flagged with a + is our selection for the picture of the week.

Featured Resources and Technical Contributions


Added by Vincent Granville on April 8, 2017 at 7:30am — No Comments

From Bullock Cart to Hyperloop – Digital Transformation of Travel

Remember when you were teenager and wanted to go on vacation with parents-you were asked to go to travel agent and get all the printed brochures of exotic locations?  
Then came the wave and online booking sites like Expedia, Travelocity, Makemytrip paved so much that took travel agencies out of…

Added by Sandeep Raut on April 8, 2017 at 6:30am — No Comments

GitHub Profiler: A Tool for Repository Evaluation

Contributed by Evan Frisch. 

GitHub hosts over 84 million repositories, a number that continues to grow rapidly. Software developers must consider a number of important factors as they decide whether to use -- or contribute to -- a project hosted on the site. GitHub Profiler provides a number of indicators that can help with such decisions. With the wealth of public repositories it hosts, GitHub often makes it easy to find many libraries that…


Added by NYC Data Science Academy on April 7, 2017 at 12:30pm — No Comments

Machine-Learning with Renthop

Contributed by David Letzler, Kyle Gallatin and Christopher Capozzola They…


Added by NYC Data Science Academy on April 7, 2017 at 12:00pm — No Comments

Cleantech in the News: Scraping and Analysis of Online Articles

Contributed by Thomas Kassel. He enrolled in the NYC Data Science Academy 17-week remote bootcamp program, taking place from January to April 2017. This post is based on his third class project focusing on web scraping in Python. The original article can be found…


Added by NYC Data Science Academy on April 7, 2017 at 12:00pm — No Comments

18 Great Blogs Posted in the last 12 Months

This is part of a new series of articles: once or twice a month, we post previous articles that were very popular when first published. These articles are at least 6 month old but no more than 12 month old. The previous digest in this series was posted here a while back. 

18 Great Blogs Posted in the last 12…


Added by Vincent Granville on April 7, 2017 at 7:30am — No Comments

Introduction to Anomaly Detection

In this article, Data Scientist Pramit Choudhary provides an introduction to both statistical and machine learning-based approaches to anomaly detection in Python. Introduction: Anomaly Detection 

This overview is intended for beginners in the fields of data science and machine learning. Almost no formal professional experience is needed to…


Added by Emmanuelle Rieuf on April 6, 2017 at 12:30pm — No Comments

Factoring Massive Numbers with Machine Learning Techniques

We are interested here in factoring numbers that are a product of two very large primes. Such numbers are used by encryption algorithms such as RSA, and the prime factors represent the keys (public and private) of the encryption code. Here you will also learn how data science techniques are applied to big data, including visualization, to derive insights. This article is good reading for the data scientist in training, who might not necessarily have easy access to interesting data: here the…


Added by Vincent Granville on April 6, 2017 at 8:00am — 3 Comments

What type are you? Six Job Categories for Data Scientists

Once dubbed as the sexiest job of the 21st century by The Harvard Business Review, data scientists take pride in having adept technical skills in providing solutions to problems through data visualization, pattern recognition, text analytics, and data preparation among many other skills.

Given the various industries that utilize data and draw valuable insights from it to enhance their businesses and services, data scientists play a huge role in the progress of any…


Added by Laura Buckler on April 6, 2017 at 3:00am — 1 Comment

Record linking with Apache Spark’s MLlib & GraphX

The challenge

Recently a colleague asked me to help her with a data problem, that seemed very straightforward at a glance. 

She had purchased a small set of data from the chamber of commerce (Kamer van Koophandel: KvK) that contained roughly 50k small sized companies (5–20FTE), which can be hard to find online.

She noticed that many of those companies share the same address,…


Added by Tom Lous on April 4, 2017 at 11:00pm — 5 Comments

Implementing the Gradient Descent Algorithm in R

This article was posted by S. Richter-Walsh

A Brief Introduction: 

Linear regression is a classic supervised statistical technique for predictive modelling which is based on the linear hypothesis:

y = mx + c

where is the response or outcome variable, m is the gradient of the linear…


Added by Emmanuelle Rieuf on April 4, 2017 at 6:00pm — No Comments

Blog Topics by Tags

Monthly Archives













  • Add Videos
  • View All

© 2020   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service