InformationWeek has an interview this week with resident Data Science Central blogger Michael Walker about the most common traps awaiting data scientists:
Added by Michael Malak on October 10, 2013 at 8:26am — No Comments
A new 191-page PDF eBook published by the National Academies of Sciences Press is available, "Frontiers in Massive Data Analysis," and can be downloaded for free (after free website registration):
The first 9 of the 10 chapters offer a comprehensive survey of state-of-the-art big data architectures, machine learning, and analysis techniques.
Chapter 10 really…Continue
Added by Michael Malak on September 23, 2013 at 9:35am — No Comments
There have been various attempts to integrate the D3.js visualization framework into iPython Notebook, in order to provide more visualization options than available with the standard Matplotlib. In my blog post today, I take one of the better integration attempts out there, port it from Windows to the Mac, and demonstrate:
2. Generating geo color maps in D3.js (not a built-in…Continue
Added by Michael Malak on July 29, 2013 at 4:23am — No Comments
My new blog post on what I coined as "sparkgrams". Included is an implementation in YUI3 for custom website presentations of data, but I wish R and iPython Notebook had similar functionality.
Added by Michael Malak on June 18, 2013 at 5:17am — No Comments
Spark and Spark Streaming are two components of the "Berkeley Data Analytics Stack" (BDAS). Spark Streaming is one of the few open source options available for "Real-time Big Data". See my slides and 35-minute presentation from last night, which was part of Global Big Data Week:
Added by Michael Malak on April 24, 2013 at 12:55pm — No Comments
I found it odd there was no way to automatically deskew data in R, so I wrote a short little function to do it. It noticeably improves the peformance of linear models and linear support vector machines.