Home » Uncategorized

5 Minute Analysis: Breaking Down Health Code Violations

In this 5 Minute Analysis we’ll focus on accessing and understanding the Kaggle LA Restaurant & Market Health Data in real-time, exploring the data, and pivoting the data to report the top violators of the health code and their violations.

Dataset: Kaggle LA Restaurant & Market Health Data

This blog post explores and analyzes the data using PivotBillions, available freely on docker.

Docker Image

Goals

  1. Load the data to Pivot Billions and view its structure.
  2. Explore the data using Pivot Billions built-in features.
  3. Pivot the data to organize it by violator name and violation to see the worst violators and report our findings.

Steps

Load the Data and View its Structure

  1. Download the dataset from Kaggle.
  2. Unzip your downloaded data.
  3. Access the Pivot Billions URL for your machine.
  4. Click the Plus (+) icon on the top right hand side of the window.
  5. Select Drag & Drop.

zbnqchne1HXdgZSlqXPwFZObm3kBzCP2B6ycBEWP5bmMX1aLWlMU_FEIscfDfIARTmYe-Z9qdnxKh0NSMADHSTkavA1HPH3p13n1tx8QYIlr5WlxgfX1VOe38F-c–1Lepo_PlFM

  1. Drag your downloaded “restaurant-and-market-health-violations.csv” file to the Drag & Drop box in Pivot Billions.
  2. Click the dropdown arrow qYTO66L_zEt7_OmFbgm0TiKxWv8tMz67u2FgauYF3v_eo2Fz1Yxl6NAuyTh2pYXv9BPVAC9fhukbc62-UbWWY41NNTkoRMQqLcfEscQWPYMLBiXVu4jCVZD2qSNnRi5lUzjQaw_q to the right of the file in Pivot Billions to view the schema of the data and see a sample.
  3. Then select the left checkbox next to the file and click Preview at the bottom of the screen.

y4FdHIYYQOCEPxh7W7kY_Pk1zF_VNjej7xkK7WY0aDV4KJ3RHALMeHOEJSs7EYA8sO65HtC-ju_30poXfZbTCA-_otGDYtPHvJ7x4TrZpmG23V5vCLahzbRLtot-8lD5uPIn-zux

You can now see the columns and types of the dataset and modify them as you see fit. You can also view or change which column or columns are set as primary keys. When you are done viewing or modifying the data structure to be imported, click Import.

e5Uv9xm0TOXbAV2cjF6cfxFpi-UoMcvlDpdHzKZl_ZxLkf899dtuP-P_X-d_4QeZ3IUjej8w_QEPnsW7_EE0NBKPwZrnPGoBzL6vjDKy4aZ79d_-HqhH_6z5jGly9cFltTxbje18

View and Explore the Data

After the data has been quickly imported you can now see and access all 272,801 rows of the data.

GyU8avwNW7gG5yT3DjA5zxAW9DDJ5-Et0vg9VPSPhiDCiFcE7PEuaGneYqHVH-fgIwk8QJ-LxqdCTQLlLiQmkdxA7muDAsk-068MrebEPrNZN7Rx8aC9Q0eP_JH1ms_q9mBX1jev

By hovering over each column name you can sort the data by that column, view that column’s distribution over all of the data, filter by the data in that column, or rename that column. We’ll view the distribution of the data by the owner’s name.

Click on the second icon (distribution) in GPGBa7gEjNi0OngTqo7-K0GgukHk32Ca3aeglP4k7EbbS3TdKVZOPAi5gPjBs_1hJ8Qr6KHCFiBL-hsOmFIA-_Z3-BZDKAzfTj23kLz41MDz48i9QbZz_iicb4HTWlq3nrLeB1xT for the owner_name column to see the distribution of total health code violations by owner.

z5iS3jiPVe1uFhiEPyHvXnLWwQV1lMgo1HpSJ5Q1P8BOsgBlmvRc-fizRJcEguACmiKt0_-EdEM-tqCDt-TK9f2qIEO075FqX4iywjsiFvqwRCJzyb1oCXgU2w4QU17QdgbwPYcW

You can quickly see that Ralphs Grocery and Levy Premium have the highest number of violations. It is worth noting that Levy Premium actually has the highest total violations; however, its data is spread across two slightly different owner names.

Pivot and Report the Data

Now that we know which owners had the highest number of violations, we want to drill down into the data and see which health codes each owner violated. This is made extremely simple and fast using Pivot Billions.

  1. Click the Pivot icon 9ahR3lLJQf1HCB1og9rnZXFS-Qt5w_14LWNoMEnrv2o1Vya_vyiTyZtAPFevPAEW-QNFqXcDV2IysSBTI-9fKlqrRaHUPVgb1KEJIpsA-F9oJ22nlj0CzaIR1NT99zYJ_yzlcS6D in the the top right of your data table.
  2. Click the Plus (+) icon under Dimensions and select the “owner_name” column.
  3. Click the Plus (+) icon again and select the “violation_description” column.
  4. Click View to pivot your data.

7j8THutenQfovvgjpBHGNSc_tC5WqeQvRlljnD_TSae3aek1drn3uv0b8tLBxycGTOxNDP_UinkE9603Hk1mr_oSfH3NT47yYvuRNfetH4llqUCKBxEwPPX4Fu1y7lulFat28nYA

Pivot Billions now quickly reorganizes your data by owner name and violation description and provides counts of each unique combination’s occurrence in the data. You can sort by a column or filter the data. Here we’ll filter out the small violation counts since they are less significant and it makes the data more readable.

  1. In the top-left of the pivot widget, click the Plus (+) button.
  2. Select “Count” and “Greater Than”.
  3. Enter 10 and press enter.

uiWol_E-9qmU9emhIOn3DV74aJZql_Meyj_uhD6BAbxZ6uJ5XLxgGfyoQPCcoAOlxi9pUO3mnZoKC3YBsmM1qz4BUVZyZerfAKzLAWlafrvpT29lYJsYVbp19J4pbOIihuaYFNJ0

You can see the filter applied and the data reduced from 128,749 unique combinations to just the top 850 worst owner violations.

3IYThbSeUPAvwob6AkOWhkHHfnwqLP2BWHxWdCnEGtRDoY6ihQ6iNvh-1CgO98c4KOdAa4GNzTZPfkZdPFpB6GM1pQXXzFdd3i3NOmAOAjGXnwaQQqxHftfdGU5yTgAbRbKTFwE3

We’ll now interactively view the data.

  1. Click the Switch View Type 4FbOpCUBGZD-_w0MWTcmzFtwvlAuv7Kgc4WyhMex7G5qo9m0lm58HdfFnQ9poaxisj7NUPPe22PhxHIqOTD1YBgMUEj-HtTqBtvkY47Gsky8qisys6e3uQ1L9_y0o3hw6g1W2bF6 icon in the top right of the pivot widget and select Pivot View.
  2. Drag the owner_name box to below the drop down selection box and the violation_description box to the right of the drop down selection box as shown below.

bMxdF00wtjZ7hLXHnyAX9biwZ0uxvmWKVbLnEU5qgCOsXOirGfUDY-De_QtUr3NqRuGhmnEEJzmx65P_b5kwrVkiPM0xY0bFQwNe-2tlKLeltzMsui-ti0pTGVnstvao2LI1Uyj7

You can now easily sort the data to put the highest-count owner violation combinations in the top left.

  1. Click the AZPZnokbK-l4dvzfmlNN8Vc4Lna8X9hmuP6u9NEeJuc6U4FixsA4_zmw36d0hsvRC25GGK2hYCtE9bzYeB6Xsc6OEPqy03T6avrMB1n-vy_w_l8LYXfwVsm4q4Pt_oBfJHxKQygB arrow twice.
  2. Click the jSiWazK3Wd5FSfOjiM-kd0LXLI9qkJjsYO1JNGekM6VwKl0UP8I6NV1Bn6g0ceNxEwJQQA4Tq170e_ZIC6K_MZeCqiVKzmkBMTnCA5_5e_KDusFcyuLPIh8Vobz9mJH-Z1MMlnj3 arrow twice.

F6R15ff3XQEwLWtyEvHV24-h5Nn9Rpoz-TkPtZdW7vzoIm_QxA-JmdFuaa2A_oVZn8iqiqnAWBpXT6lMJbH_bKmLYSPdqgEgnBQPaAXedXlFLcfOeawGRakSgDxwEY8SuN4zeGvj

Now select the top-left drop box and change it from Table to Table Barchart to view a more visual representation of this data.

P7QLR6PulvpOEG3WpypYaK_koCRcXqqNo0x-DBEnux4GSSeyrOe2ytWHmIB9FVLgc7B25yjAgGbYsrbbFKzo5erUp6gstvM3Y8A–d__0xjBdf29q8VOnvWbskFpi_fcWurvAhe_

From our analysis it is clear that Ralphs Grocery Company is not only one of the worst offenders of health code violations, but it also has very high counts of a large number of different violations. Although Levy Premium has more total violations when the data for its two owner names are combined, the violations are slightly more consolidated.