A few data sets are accessible from our data science apprenticeship web page.

  • Source code and data for our Big Data keyword correlation API (see also section in separate chapter, in our book)
  • Great statistical analysis: forecasting meteorite hits (see also section in separate chapter, in our book)
  • Fast clustering algorithms for massive datasets (see also section in separate chapter, in our book)
  • 53.5 billion clicks dataset available for benchmarking and testing
  • Over 5,000,000 financial, economic and social datasets
  • New pattern to predict stock prices, multiplies return by factor 5 (stock market data, S&P 500; see also section in separate chapter, in our book)
  • 3.5 billion web pages: The graph has been extracted from the Common Crawl 2012 web corpus and covers 3.5 billion web pages and 128 billion hyperlinks between these pages
  • Another large data set - 250 million data points: This is the full resolution GDELT event dataset running January 1, 1979 through March 31, 2013 and containing all data fields for each event record.
  • 125 Years of Public Health Data Available for Download

You can find additional data sets at the Harvard University Data Science website. I was particularly interested in their LinkedIn data set. KDNuggets is also a great resource, and for more, check out this link

Cross-disciplinary data repositories, data collections and data search engines:

Single datasets and data repositories

Views: 263516


You need to be a member of Data Science Central to add comments!

Join Data Science Central

Comment by Mohammed Innat on June 30, 2018 at 7:51am

I just start to learn Big Data. But few silly things irritate a lot. Big data generally minimum TB in size, right? But when I follow referred links about the data sets of Big data, the file is so small in size, max MB. So, where to find to download TB or PB sizes data set to work in Big Data ? Please, correct me if I'm thinking wrong about Big Data.

Comment by James Theobald on September 12, 2015 at 11:27am

databib.org is now http://www.re3data.org/

Comment by Jeffrey Mather on August 26, 2015 at 11:15am

Thanks for taking the time to gather this

Comment by RAVINDER PAL VASHIST on June 2, 2015 at 12:10am

many thanks Vincent

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service