Subscribe to DSC Newsletter

Columbia data science course, week 1: what is data science?

Cathy O'Neil, mathbabe tells her experience attending this program.

I’m attending Rachel Schutt’s Columbia University Data Science course on Wednesdays this semester and I’m planning to blog the class. Here’s what happened yesterday at the first meeting.

Syllabus

Rachel started by going through the syllabus. Here were her main points:

  • The prerequisites for this class are: linear algebra, basic statistics, and some programming.
  • The goals of this class are: to learn what data scientists do. and to learn to do some of those things.
  • Rachel will teach for a couple weeks, then we will have guest lectures.
  • The profiles of those speakers vary considerably, as do their backgrounds. Yet they are all data scientists.
  • We will be resourceful with readings: part of being a data scientist is realizing lots of stuff isn’t written down yet.
  • There will be 6-10 homework assignments, due every two weeks or so.
  • The final project will be an internal Kaggle competition. This will be a team project.
  • There will also be an in-class final.
  • We’ll use R and python, mostly R. The support will be mainly for R. Download RStudio.
  • If you’re only interested in learning hadoop and working with huge data, take Bill Howe’s Coursera course. We will get to big data, but not til the last part of the course.

The current landscape of data science

So, what is data science? Is data science new? Is it real? What is it?

This is an ongoing discussion, but Michael Driscoll’s answer is pretty good:

Data science, as it’s practiced, is a blend of Red-Bull-fueled hacking and espresso-inspired statistics.

But data science is not merely hacking, because when hackers finish debugging their Bash one-liners and Pig scripts, few care about non-Euclidean distance metrics.

And data science is not merely statistics, because when statisticians finish theorizing the perfect model, few could read a ^A delimited file into R if their job depended on it.

Data science is the civil engineering of data.  Its acolytes possess a practical knowledge of tools & materials, coupled with a theoretical understanding of what’s possible.

Driscoll also refers to Drew Conway’s Venn diagram of data science from 2010.

We also may want to look at Nathan Yau’s “sexy skills of data geeks” from his “Rise of the Data...:

  1. Statistics – traditional analysis you’re used to thinking about
  2. Data Munging – parsing, scraping, and formatting data
  3. Visualization – graphs, tools, etc.

But wait, is data science a bag of tricks? Or is it just the logical extension of other fields like statistics and machine learning?

Read full (long) article.

DSC Resources

Additional Reading

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 3357

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service