Subscribe to DSC Newsletter

Michael Malak's Blog (6)

5 Data Science Sins To Beware

InformationWeek has an interview this week with resident Data Science Central blogger Michael Walker about the most common traps awaiting data scientists:

http://www.informationweek.com/big-data/news/big-data-analytics/5-data-science-sins-to-beware/240162426

Added by Michael Malak on October 10, 2013 at 8:26am — No Comments

Free NAS eBook: "Frontiers in Massive Data Analysis"

A new 191-page PDF eBook published by the National Academies of Sciences Press is available, "Frontiers in Massive Data Analysis," and can be downloaded for free (after free website registration):

http://www.nap.edu/catalog.php?record_id=18374

The first 9 of the 10 chapters offer a comprehensive survey of state-of-the-art big data architectures, machine learning, and analysis techniques.

Chapter 10 really…

Continue

Added by Michael Malak on September 23, 2013 at 9:35am — No Comments

Choropleth in D3.js and Pandas (iPython Notebook)

There have been various attempts to integrate the D3.js visualization framework into iPython Notebook, in order to provide more visualization options than available with the standard Matplotlib. In my blog post today, I take one of the better integration attempts out there, port it from Windows to the Mac, and demonstrate:

1. Passing a Pandas Dataframe from iPython Notebook into the D3.js Javascript

2. Generating geo color maps in D3.js (not a built-in…

Continue

Added by Michael Malak on July 29, 2013 at 4:23am — No Comments

SparkGrams: compact in-spreadsheet histograms

My new blog post on what I coined as "sparkgrams".  Included is an implementation in YUI3 for custom website presentations of data, but I wish R and iPython Notebook had similar functionality.

http://technicaltidbit.blogspot.com/2013/06/histogram-thumbnails-inside-yui3-data.html

Added by Michael Malak on June 18, 2013 at 5:17am — No Comments

Spark Streaming: Real-time Hadoop

Spark and Spark Streaming are two components of the "Berkeley Data Analytics Stack" (BDAS).  Spark Streaming is one of the few open source options available for "Real-time Big Data".  See my slides and 35-minute presentation from last night, which was part of Global Big Data Week:

 

http://technicaltidbit.blogspot.com/2013/04/presentation-on-spark.html

Added by Michael Malak on April 24, 2013 at 12:55pm — No Comments

Automatically deskew before machine learning in R

I found it odd there was no way to automatically deskew data in R, so I wrote a short little function to do it.  It noticeably improves the peformance of linear models and linear support vector machines.
http://technicaltidbit.blogspot.com/2013/03/automatically-deskew-before-machine.html

Added by Michael Malak on March 9, 2013 at 2:00pm — 1 Comment

Videos

  • Add Videos
  • View All

© 2020   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service

console.log("HostName");