Subscribe to DSC Newsletter

It is interesting to see what Harvard considers to be data science. They use Python in all projects / training (there's nothing wrong with that, though exposure to other languages - R, Stata, SQL - would be great, in top of Python). It is too traditional, and too heavy in statistics in particular. I did not see anything about machine-to-machine communications (e.g. keyword bidding), processing real time data, the curse of big data (and how to address it), API building and implementation, automation. Too much time spent on old regression and clustering methods. Exploratory analysis is something that should be automated. Experimental design need to be added. Their recommended reading list is biased towards traditional data analysis.

Content

  • data munging/scraping/sampling/cleaning in order to get an informative, manageable data set;
  • data storage and management in order to be able to access data - especially big data - quickly and reliably during subsequent analysis;
  • exploratory data analysis to generate hypotheses and intuition about the data;
  • prediction based on statistical tools such as regression, classification, and clustering; and
  • communication of results through visualization, stories, and interpretable summaries.

Stacks Image 0

The three modules are as follows:

  • Prediction and elections module: how did Nate Silver predict 50 out of 50 states correctly in the 2012 U.S. presidential election, and 49 out of 50 correctly in the 2008 election? How much of that was luck? We will discuss how to find, process, combine, visualize, simulate, and summarize election-related data and questions, especially if there are conflicting polls with different reliabilities.
  • Recommendation and business analytics module: the Neflix Prize was a famous recent example of collaborative filtering: given information about which movies various users have liked and disliked, how should Netflix make recommendations for what movies a user should watch? Many other companies are interested in closely-related problems. Often there is a very large but very sparse data set (e.g., there could be millions of users and tens of thousands of movies, but very few users rate more than a few hundred movies). We will explore techniques for working with such data.
  • Sampling and social network analysis module: social, biological, and technological networks are attracting interest from many fields. They are examples of relational data, in which there are measurements on pairs of individuals, not just on individuals. But computation and visualization for a network with more than, say, 50 nodes (individuals) presents many challenges in scalability and interpretability. We will study techniques for drawing a sample from a network, for analyzing network data (e.g., finding “communities” and “influential” nodes in the network), and for visualizing network data.

Prerequisites

Programming knowledge at the level of CS 50 or above, and statistics knowledge at the level of Stat 100 or above (Stat 110 recommended).

Data sets

On the plus side, they offer an interesting list of available data sets for prospective students, including LinkedIn Data

Click here for more information.

Related articles

Views: 15776

Comment

You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Peter Goldey on December 16, 2014 at 4:00am

Hi Vincent - 

I'm a Harvard grad - many years ago.  Discussions about higher education are complex.  Personally, I'm not old money, old boys club, etc.  Neither wealthy or well connected.  But I think you are both close and far from the value of a Harvard experience.  Like many schools, part of the value is the network that you build there and also that Harvard offers - it really is a big part of the value.  But also, while at this (or many other universities) you  have the opportunity to be part of an informed, interested, active community.  One that happens to have tremendous resources and gives great freedom to explore to its students.  

Given that Harvard in particular has a need blind admissions policy and tremendous financial support for its students (not to say it still doesn't cost $$'s), the question should really be about whether the environment and experience is optimal.  Personally, I had a tremendous time, learned a lot of myself and how to think about and approach questions and situations, and collaborate with extremely talented people.

I did that without a specific post-grad career focus...I actually studied folkore and mythology (really the study of why people believe certain things) and then self-taught everything about my career in data / analytics from there.

There are certainly a number of people at Harvard that had that experience handed to them and will walk out with old money careers and connections just because of their family connections.  But that shouldn't minimize what the majority of students there experience.

I wouldn't say Harvard is the best place to get a Data Science degree...at least not at this point.  The question will be whether they can attract the cutting edge thinkers and contributors to their faculty that they have in other disciplines.

Comment by abbas Shojaee on December 20, 2013 at 5:07am

Prestigious universities specially those departments which are not dedicated to computational sciences (e.g. in healthcare/ social/ management sciences) are seriously lagged in employing new data science and their kind of viewpoint on data science is more conventional. 

Comment by Steve Miller on December 20, 2013 at 4:52am

For just a single course, the curricula looks excellent to me. I'd be delighted to speak with graduates who successfully completed this class for apprentice data science positions with my consulting company, OpenBI.

Comment by Brian Feeny on December 19, 2013 at 1:44pm

Disclaimer:  I am a student at Harvard, and just completed this course for my degree program.

The class is just one of many classes Harvard offers that would be applicable to "Data Science".  The class does not try to be all things to Data Science, nor does it make those claims.  We just had our final class projects, and I must say they were very impressive.  Many of the techniques used were not directly taught in the course.  The course is designed to expose students to Data Science, and it does a good job of that.  It teaches you some phenomenal things.  I can tell you I worked on this class 30+ hours some weeks, it was very intense and very rewarding.

Harvard has plenty of stats classes, classes on machine learning, visualization, data mining, programming, etc.  This class did a great job of mixing a little bit of everything, and was challenging.  You also have to consider that this is the first time this course was given.  We had over 300+ people enrolled, not as a MOOC but as actual students.  The materials, lectures, sections, attention given by the staff, etc. were all what you would normally find in a much more mature course.  There was also feedback solicited and given at every interval.  I have no doubt this class will continue to evolve.

I come from a background with R. I had NO previous experience in python before the class.  In the end of the class, I was efficiently and easily doing advanced things in python.  I had built an LDA model in which all the text processing was done over multiple cores using ipython clustering, and the LDA model itself was distributed across cores as well.  The data was about 1 million reviews of american K-12 schools.

My classmates could not work on this dataset in R, they would have to use a premium version which supported enough memory or persist to disk using libraries.  Python was a good choice.  Python vs. <insert any language here> is not the point, the methodology is the point.  We selected models, extracted features, trained models, measured our predictions, grid searched for hyper parameters, cleaned data, scraped data, visualized data, normalized data, etc, etc, etc.  We did a lot of all of this.

The final projects were intense.  We had 3-4 people per group working endless hours to put together websites, and original analysis.  

I applaud Harvard for making the class available to others and publishing the materials.

Harvard is a top school for STEM, and there are many talented people at the school.  I think its great the school not only has this class, but opens it up to basically anyone in the school that has taken a basic programming class.  I have taken many classes at Harvard (3rd year grad student) and I have taken classes elsewhere, and I can tell you this was the most impressively ran classes I have taken anywhere.

Comment by Vincent Granville on December 18, 2013 at 10:49pm

@Myles: It depends the amount of efforts and money you need to spend to get the certification or degree in question. I believe programs like Harvard are for the very wealthy and well connected, and the value of the degree is the connections that you will make during the training (indeed, this is a tremendous value), to help you land a Wall Street job or VC funding for your company. Without such a degree, the door to the old boys clubs - the people who manage this country - will always be shut (there are some rare exceptions). But the intrinsic technical value of the degree is limited: Everything that you can learn by yourself on the Internet, should be free, and it is actually free, if you are smart enough to know where to find the knowledge in question.

Even if you manage to be accepted in Harvard (other than for programs aimed at CEO's) but was not born in wealth, Harvard has very little value as you will never be integrated with the kids from the rich and famous. You can learn more up-to-date, comprehensive stuff in programs that take much less time and much less money. Also, creativity and innovation is not developed or fostered in many of these programs. Creative people keep being creative despite all the efforts by schools - from first grade to PhD - to kill creativity. 

Disclaimer: I was born in poverty, the bottom 1% - we did not have a phone, no heating, no air conditioning, no car, no TV set, no connections that could help me land any kind of job in an economy plagued by 20% unemployment, not even hot water when I was a kid. My grand parents could not read or write. In my family in communist Wallonia, nobody had ever heard about or knew what Harvard, Yale or Princeton was. Most of what I learned (especially advanced knowledge), I learned it by myself, not at school. Somehow I managed to not stay poor anymore (to say the least), I hope this is encouraging news for all the smart people who can not afford Harvard. 

Comment by Myles Gartland on December 18, 2013 at 4:56pm

Hi Vincent. While I don't disagree with you on things that are missing, I am not sure I see that many problems with what they offer either. For most people and most companies this would be a huge step up from their daily Excel chugging. So, I think it is perspective. 

Follow Us

Videos

  • Add Videos
  • View All

Resources

© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service