After posting my article A Plethora of Data Set Repositories, I received a few messages from people and companies that have also collected a lot of data, and share it with the public. Below are two such sources. Also, you can always find new data sets by searching for data sets on DSC

200,000 Tokyo geolocated tweets. Free Twitter Dataset

1. Data.world

Featured data sets include:

  • Support for Legal Marijuana
  • Austin Affordable Housing Inventory
  • US Campaign finance stats from the Federal Election Commission
  • Shakespeare Word Frequencies
  • Climate Change Data

The data sets that I checked were available in CSC format, and rather small. They can be found here

2. Webrobots.io

They have a scraper robot which crawls Indiegogo projects and collects data about them. This robot was launched in May 2016 and they run crawl once a month. First dataset contains data about 91.5k projects. Data can be found here.

They also have a scraper robot which crawls all Kickstarter projects and collects data in JSON format. From March 2016 they run this data crawl once a month. Data can be found here

If you are aware of other valuable data sources, especially big data (more than a few gigabytes, or streaming data) please mention it in the comment section below.

Top DSC Resources

Follow us on Twitter: @DataScienceCtrl | @AnalyticBridge

Views: 12661

Reply to This

© 2021   TechTarget, Inc.   Powered by

Badges  |  Report an Issue  |  Privacy Policy  |  Terms of Service