I’m attending Rachel Schutt’s Columbia University Data Science course on Wednesdays this semester and I’m planning to blog the class. Here’s what happened yesterday at the first meeting.
Rachel started by going through the syllabus. Here were her main points:
The current landscape of data science
So, what is data science? Is data science new? Is it real? What is it?
This is an ongoing discussion, but Michael Driscoll’s answer is pretty good:
Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics.
But data science is not merely hacking, because when hackers finish debugging their Bash one-liners and Pig scripts, few care about non-Euclidean distance metrics.
And data science is not merely statistics, because when statisticians finish theorizing the perfect model, few could read a ^A delimited file into R if their job depended on it.
Data science is the civil engineering of data. Its acolytes possess a practical knowledge of tools & materials, coupled with a theoretical understanding of what’s possible.
Driscoll also refers to Drew Conway’s Venn diagram of data science from 2010.
We also may want to look at Nathan Yau’s “sexy skills of data geeks” from his “Rise of the Data...:
But wait, is data science a bag of tricks? Or is it just the logical extension of other fields like statistics and machine learning?