In this 5 Minute Analysis we'll focus on accessing and understanding the Kaggle LA Restaurant & Market Health Data in real-time, exploring the data, and pivoting the data to report the top violators of the health code and their violations.
Dataset: Kaggle LA Restaurant & Market Health Data
This blog post explores and analyzes the data using PivotBillions, available freely on docker.
You can now see the columns and types of the dataset and modify them as you see fit. You can also view or change which column or columns are set as primary keys. When you are done viewing or modifying the data structure to be imported, click Import.
After the data has been quickly imported you can now see and access all 272,801 rows of the data.
By hovering over each column name you can sort the data by that column, view that column’s distribution over all of the data, filter by the data in that column, or rename that column. We’ll view the distribution of the data by the owner’s name.
Click on the second icon (distribution) in for the owner_name column to see the distribution of total health code violations by owner.
You can quickly see that Ralphs Grocery and Levy Premium have the highest number of violations. It is worth noting that Levy Premium actually has the highest total violations; however, its data is spread across two slightly different owner names.
Now that we know which owners had the highest number of violations, we want to drill down into the data and see which health codes each owner violated. This is made extremely simple and fast using Pivot Billions.
Pivot Billions now quickly reorganizes your data by owner name and violation description and provides counts of each unique combination’s occurrence in the data. You can sort by a column or filter the data. Here we’ll filter out the small violation counts since they are less significant and it makes the data more readable.
You can see the filter applied and the data reduced from 128,749 unique combinations to just the top 850 worst owner violations.
We’ll now interactively view the data.
You can now easily sort the data to put the highest-count owner violation combinations in the top left.
Now select the top-left drop box and change it from Table to Table Barchart to view a more visual representation of this data.
From our analysis it is clear that Ralphs Grocery Company is not only one of the worst offenders of health code violations, but it also has very high counts of a large number of different violations. Although Levy Premium has more total violations when the data for its two owner names are combined, the violations are slightly more consolidated.
Views: 299
Tags: 5 Minute Analysis, Business Analysis, Data Exploration, Data Wrangling, Docker, EDA, Health, Kaggle, New Tool, PivotBillions
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central