Subscribe to DSC Newsletter

Peter Higdon's Blog (8)

Where The Cloud Meets The Grid

Companies build or rent grid machines when data length doesn't fit into HDFS, or the latency of parallel interconnects is too slow in the cloud. This review explores the overlap of the two paradigms at the ends of the parallel processing latency spectrum. The comparison is almost poetic and leads to many other comparisons in languages, interfaces, formats, and hardware, but there is amazingly little overlap.



Your Laptop Is A Supercomputer



To put things in perspective,…

Continue

Added by Peter Higdon on November 24, 2014 at 4:35am — No Comments

Your Data Science Portfolio: Math Skills Don't Matter

TL;DR: A Data Scientist is a data pipeline plumber. Analytics are icing, not cake.

This article is written specifically for unemployed and underemployed graduates of math intensive subjects like physics and statistics. Others may have more to prove.

After writing my introductory reviews of ETL and…

Continue

Added by Peter Higdon on August 4, 2014 at 10:13am — 9 Comments

Beyond The Visualization Zoo

NOTE: This article is best viewed in Chrome. Firefox does not display some of the images.

The best document I have read on visualization is called "A Tour Through The Visualization Zoo" by Jeffrey Heer, Michael Bostock, Vadim Ogievetsky. It's a must-read picture book for aspiring Data Scientists. Most of the graphics from this post are examples of the Tour taken from the d3…

Continue

Added by Peter Higdon on July 4, 2014 at 12:00am — No Comments

Your Data Science Portfolio: Be An Open Data Curator

In the drive towards the semantic web, mailing lists are ripe, low hanging fruit. They are full of wisdom totally inaccessible to the casual user. To unlock this wealth of knowledge for our apps, we need it in a format like the Stack Exchange data dump.



This data dump format is to Stack Exchange what JSON is to JavaScript: an exchange format that spawns growth in ecosystems…

Continue

Added by Peter Higdon on May 27, 2014 at 3:30am — No Comments

The Dangers of the "Talent Shortage" Myth

Every time a new technology disrupts the job market a "skills shortage" is debated between economists and politicians. The story isn't new - 10 years ago recruiters were asking for programmers with 10+ years experience in Java. The gap is widened by non-technical recruiters employing rigid traditional hiring practices. The truth is that there are all kinds of smart people with relevant skills that don't fit into HR's pigeonhole - we're generalists, not specialists. The onus…

Continue

Added by Peter Higdon on April 28, 2014 at 1:08am — No Comments

The Data Science Toolkit - My Boot Camp Ciriculum

This is a compilation has everything you need to jumpstart your skills in the core tasks of data transformation, modeling, and visualization.



tl;dr: Coursera and John Hopkins have a new course called The Data Scientist's Toolbox. https://www.coursera.org/course/datascitoolbox



MODELING



Below is a list of popular analysis from Rexer's 2013 survey. The table is biased towards customer transaction, text,…

Continue

Added by Peter Higdon on March 25, 2014 at 9:01am — No Comments

The Data Science Toolkit - The Future Web Toolkit

There's a lot of confusing jargon and buzzwords in this new field. It helps to know who some of the major players are and what services they offer. This list is a mild introduction and far from exhaustive.



Amazon Web Services: Infrastructure as a service (IaaS). EC2 virtual servers, S3 storage, Mechanical Turk, analytics, and more.

Yandex: Russian competitor for google. Recently launched Cocaine server based on Docker.

Salesforce: Customer Relationship Management…

Continue

Added by Peter Higdon on February 25, 2014 at 7:51am — 1 Comment

The Data Science Toolkit - taking your first steps towards becoming a Data Scientist

When I stumbled upon the phrase "Data Scientist" 3 years ago, I immediately recognized it as my best prospect for a productive career. How to start? What are the tools of the trade?



This is the blog post I wish I could have read back then.

Many of the things I list here didn't exist or were unstable until recently.



I discovered the "predictive analytics" rabbit hole and started to read and watch whatever I could find on the subject. Upon watching the Gigaohm…

Continue

Added by Peter Higdon on January 25, 2014 at 8:00am — 4 Comments

© 2019   Data Science Central ®   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service