Subscribe to DSC Newsletter

A few data sets are accessible from our data science apprenticeship web page.

  • Source code and data for our Big Data keyword correlation API (see also section in separate chapter, in our book)
  • Great statistical analysis: forecasting meteorite hits (see also section in separate chapter, in our book)
  • Fast clustering algorithms for massive datasets (see also section in separate chapter, in our book)
  • 53.5 billion clicks dataset available for benchmarking and testing
  • Over 5,000,000 financial, economic and social datasets
  • New pattern to predict stock prices, multiplies return by factor 5 (stock market data, S&P 500; see also section in separate chapter, in our book)
  • 3.5 billion web pages: The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages
  • Another large data set - 250 million data points: This is the full resolution GDELT event dataset running January 1, 1979 through March 31, 2013 and containing all data fields for each event record.
  • 125 Years of Public Health Data Available for Download

You can find additional data sets at the Harvard University Data Science website. I was particularly interested in their LinkedIn data set. KDNuggets is also a great resource, and for more, check out this link

Cross-disciplinary data repositories, data collections and data search engines:

Single datasets and data repositories

Views: 165556


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by James Theobald on September 12, 2015 at 11:27am is now

Comment by Jeffrey Mather on August 26, 2015 at 11:15am

Thanks for taking the time to gather this

Comment by RAVINDER PAL VASHIST on June 2, 2015 at 12:10am

many thanks Vincent

Follow Us


  • Add Videos
  • View All


© 2018   Data Science Central™   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service