Subscribe to DSC Newsletter

Zipfian Academy versus Data Science Apprenticeship

Both programs are alternatives to university curricula and traditional education. Both are run by leading industry professionals rather than academic leaders. Zipfian offers a 12-week program, the Data Science Apprenticeship (DSA, organized by us at DSC - Data Science Central) is a 6-month program.


Zipfian is far more expensive ($14,400), requires approval and is aimed at people with a data background (computer scientists, statisticians, software engineers etc.) It is offered as face-to-face lectures in San Francisco. It has (in my opinion) too much of traditional statistics.

DSA is an on-demand, DIY (do-it-yourself) program for self-learners, also aimed at professionals with a data background, focusing on

  • New material (applied and less applied) designed for big data from the ground up. It contains both technical and non technical material, presented at a high level, with side notes for those interested in digging deeper in mathematics or computer science (these side notes can be skipped).
  • A data science cheat sheet that summarizes all you need to know or learn in a few pages,
  • Data sets (not small ones) and real life projects. Some of these data sets are extracted with a web crawler, by participants.
  • Our 300 pages Wiley book to be published in March 2014 (right now, material is available for free as blog posts throughout DSC).

The DSA is entirely free, though in 2014 participants will have to purchase our book ($40 or so) and we will deliver a certificate, as well as reviewing projects done by students (for a fee).

Click here to find other data science programs.

Data Science Apprenticeship (DSA)

Here are three main components of our program:

A few data sets available for download, from the following articles:

The following articles are included in our curriculum, so you can start reading them now

List of potential projects for students:

Zipfian Academy

Click here to see program details. Below is a snapshot found on their website.

Software Engineering

  • Git and version control
  • Data Structures
    • Dictionaries and Hash Tables
    • Trees (binary, balanced, splay, B)
    • Heaps
    • Stacks and Queues
    • Graphs and Networks
    • Sets
  • AlgorithmsPerformance (Asymptotic Analysis, hardware restrictions, indexing, etc.)
    • Search (BFS, DFS, A*, Dijkstra's)
    • Sorting (merge, quick, heap, radix)
    • Selection

Machine Learning

  • Unsupervised
    • Clustering (K-means, Hierarchical, etc.)
    • Association Analysis (FP-Growth, MDS, etc.)
    • Dimensionality Reduction (PCA, SVD, etc.)
  • Supervised
    • Classification (Naive Bayes, kNN, SVM, etc.)
    • Regression (Linear, Polynomial, Tree, etc.)
  • RecommendationOptimization (cost functions, hill climbing, etc.)Anomaly Detection and timeseriesEvaluation (Cross Validation, recall, precision, ROC)
    • Similarity metrics (Jaccard, Pearson, Euclidean)
    • Item vs. User vs. Content based
    • Limitations (Cold-start, preferences, performance)

Statistics and Probability

  • Descriptive statistics (mean, mode, variance, etc.)
  • Estimation (confidence intervals, sampling, etc.)
  • Correlation (covariance, causation, etc.)
  • DistributionsSignificance (Hypothesis testing, ANOVA, etc.)
    • PMF, PDF, CDF, CMF
    • Histograms and Scatterplots
    • Normal, Binomial, Exponential
    • Probability Plot
    • Central Limit Theorem
  • Conditional Probability
    • Bayesian Statistics
    • Random Variables and Conditional Distributions
    • Monte Carlo Methods

Utilities: Shell/UNIX

  • Pipes and directing output
  • Essential utilitiesRegular Expressions
    • Explore (head, tail, more, less, grep)
    • Transform (sed, awk, cut, tr, sort, join)
    • Schedule (cron, watch)
    • Visualize (gnuplot)

Data at Scale

  • MapReduce paradigm (Hadoop)
  • Distributed Datastores (HDFS, Cassandra, HBase)
  • Hadoop Ecosysytem (Pig, Hive, HBase, Flume, Sqoop, etc.)
  • Real-Time (Spark, Storm, Shark)
  • Distributed Machine Learning


  • HTTP, APIs, and ReST
  • Parsing
    • HTML and XML
    • JSON
    • PDF
    • CSS and XPath


  • SQL (Postgres, MySQL)
  • NoSQLFilesystem and Text
    • Document (MongoDB, CouchDB)
    • Graph (Neo4j)
    • Key-Value (Redis, Voldemort)


  • Feature Preparation
    • Vectorization (binning, bag of words, tf-idf)
    • Selection (automatic and manual)
    • Normalization
    • Regularization and Smoothing
  • Natural Language Processing
    • N-grams
    • Tokenization
    • Sentiment Analysis


  • Grammer of Graphics (ggplot2, Bokeh)
  • Interactivity (Javascript, D3.js, HTML)
  • Geographic and Maps
  • Charts and Plots (matplotlib)

Views: 6489


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Ryan Orban on November 11, 2013 at 3:30pm

Here at Zipfian Academy, we received some questions from Data Science Central readers about our program. Here’s a little bit more about Zipfian Academy:

We fully support open data science education. Resources like MOOCs, Learn Data Science, the Open-Source Data Science Masters, and the DSC Apprenticeship are great ways to get started or brush up on skills. Many of our current students taught themselves data science basics with resources like these before applying to Zipfian Academy. Some of the best resources we've found are highlighted in our Data Science Central post: A Practical Introduction to Data Science.

The experience we offer is different: 12 immersive weeks of in-person learning. Students in the program are attracted to its intensity. With a custom-built curriculum and exercises, learning happens faster than it does through self-study. Access to 480 hours of personalized instruction provides context and assessment that deepens understanding. Instructors and like-minded learners provide live help with troubleshooting, teach good habits, and help overcome stumbling blocks.

Our curriculum is based on feedback from our hiring partners and aims to teach the practical side of data science. As we can’t teach everything in 12 weeks, the focus is on the core set of skills needed for practitioners in the field. Projects in the class mirror the way that data science is practiced at leading technology companies and students analyze data shared by hiring partners and interesting startups alike.

Many students are excited about Zipfian Academy’s relationships with hiring partners, which include Facebook, LinkedIn, and Eventbrite. The program includes exclusive recruiting events with data scientists and recruiters. We also help our students prepare for technical interviews through interview practice and solving data science problems commonly seen in industry.

For more details about Zipfian Academy or to apply for the next cohort, visit


Comment by Suresh Pulipaka on November 11, 2013 at 9:19am

Hello Vincent,

When does Data Science Apprenticeship course starts?


Comment by Dr. Z on November 11, 2013 at 8:53am

It is clear that in terms of accessibility DSA is by far better. The Zipfian academy seems a bit elitistic anyway. I don't think anyone who knows how to use a search engine would go for this option. Personally I find the Persontyle courses to be more promising as they combine the best of both worlds.

Follow Us


  • Add Videos
  • View All


© 2017   Data Science Central   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service