In this 5 Minute Analysis we'll focus on exploring the collection of Kaggle datasets data in real-time, reorganizing it, and filtering the data to find popular datasets with many downloads but very few kernels.
Dataset: Complete Kaggle Datasets Collection
This blog post explores and analyzes the data using PivotBillions, available freely on docker.
You can now see the columns and types of the dataset and modify them as you see fit. You can also view or change which column or columns are set as primary keys. When you are done viewing or modifying the data structure to be imported, click Import.
Now that we can view and explore the data, let’s reorganize our data to dive into datasets with many more downloads compared to kernel use.
Pivot Billions now quickly reorganizes your data by dataset title, description, and number of kernels. It also provides counts, sums, and statistics on the downloads of each dataset. You can sort by a column or filter the data. Here we’ll add some filters to restrict the data to just datasets with many downloads but only a few kernels.
You can see the filters immediately applied and the data reduced from 7,666 unique combinations to just the 612 unique combinations matching our filters.
We’ll now interactively view the data.
We can immediately see a variety of very popular datasets that have been downloaded thousands of times yet have very few or no kernels developed. Many of these are likely underutilized datasets that aren’t easily understood using existing tools and could benefit from additional exploration and analysis incorporating new tools such as PivotBillions.
Views: 1127
Tags: 5 Minute Analysis, Business Analysis, Data Exploration, Data Wrangling, Datasets, Docker, EDA, Kaggle, New Tool, PivotBillions, More…Understanding Data, filter, interesting data, pivot, reorganize
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central