Home » Uncategorized

5 Minute Analysis: Simplifying Iowa Liquor Sales

In this 5 Minute Analysis we’ll preprocess, map, and explore complicated sales data for liquor stores in Iowa. Then we’ll extract the relevant latitude and longitude from a problematic column of the data and discover the city with the most sales. Next we’ll filter the data to that city and prepare the data for easy loading into Business Analysis tools such as Tableau and PowerBI. Finally we’ll use our extracted location data to visualize sales within the selected city.

Dataset: Iowa Liquor Sales

This blog post explores and analyzes the data using Pivot Billions, available freely on docker.

* I recommend allocating 7.5 GB to your docker container for this analysis since this dataset is over 12 Million rows.

Steps

Add the Data and View its Structure

  1. Download the dataset from Kaggle.
  2. Access the Pivot Billions URL for your machine.
  3. Click the Plus (+) icon on the bottom left hand side of the window.
  4. Select Drag & Drop.

esTHecDbDZB5eJVasw1bXbYCbowxPKPEexP9JWL234KMDOxF3kmybe4yU9gcWa7U5wG_YQdJkTHoQNfAtOD9A0lBNjc2AYxnyfBUxWoT1LyoFlKmZAm3oWmTV5ik_PAXhJbQZ6lw

  1. Drag your downloaded “Iowa_Liquor_Sales.csv.zip” file to the Drag & Drop box in Pivot Billions.
  2. Click the dropdown arrow 6D_XNUdSqSX5az4fcdQmmQqRmRR21cnAat2dc-UXtONk7Sk-nV39mt8QhUWnEL_bPkx6EE3yRiwh6ithi-CMvXOO25P7p2UKwIj8qA-rhiKAezllvNlMTsP8zwxThMjFLyYZm6qE to the right of that file in Pivot Billions to view the schema of the data and see a sample.
  3. Then click Column Preview at the top of the screen.

e1bfeKgP7hx2u_mQ0vB99uDtxbhqiqFQGcUcuOZCwKUkQ9B7fCzTrvJagvyvQurnzOkbZOeuHRtHq-RPLpDuM_8yyNnQHnOh31mcGLkzGfvw5LBEfYJqARzFeLET-TWsnjkmHuOV

Perform ETL Operations on and Load the Data

The Column Preview tab allows you to see the columns and types of each dataset and modify them as you see fit. You can also view or change which column or columns are set as primary keys and determine how datasets should be joined. When you are done viewing or modifying the data structure, you can import and load the data.

For our dataset, we’ll create two new columns from the StoreLocation column: Latitude and Longitude.

  1. Click the Plus (+) icon on the bottom left hand side of the window to add a new ETL box.
  2. Click the 11n9eT2YTdB4Kurd44s7rPoCcUmEGfmhLKtm0BtKVGV-gN2N_p4pT6yyi68JEexPx03Px-TyzFjL7vayPSruQ31KQ2QTUBHq-zKdwAqPbNnG6RaaHxUapuY6eiePtLpb2ctptavA icon to the left of the ETL box to switch to Advanced ETL mode.
  3. Enter latitude into the Column Name box.
  4. Enter -eval s:latitude ‘””‘ -eval – ‘RxMap(StoreLocation,”^.*(\(.*\),\(.*\)).*$”,latitude,”%%1%%”)’ into the Ess Syntax box.
  5. Click the Plus (+) icon again to create a second ETL box.
  6. Click the 11n9eT2YTdB4Kurd44s7rPoCcUmEGfmhLKtm0BtKVGV-gN2N_p4pT6yyi68JEexPx03Px-TyzFjL7vayPSruQ31KQ2QTUBHq-zKdwAqPbNnG6RaaHxUapuY6eiePtLpb2ctptavA icon to the left of the new ETL box to switch to Advanced ETL mode.
  7. Enter longitude into the Column Name box.
  8. Enter -eval s:longitude ‘””‘ -eval – ‘RxMap(StoreLocation,”^.*(\(.*\),\(.*\)).*$”,longitude,”%%2%%”)’ into the Ess Syntax box.
  9. Confirm the output is what is expected by clicking Check.
  10. Click Import at the top of the screen.

A8IQTOwtPStRQUr8GIkV6tWL917RxxVEugdvsxBaPE1gx7nV-fBLjsfN2mC-FZmOAKg2DYY6d7exmEGbaW0kLHhIiLPMjuRElUkzVfAbju-hB9FtTE0ciAhRg8a4m2T8AHmnvqom

Explore, Filter, and Download the Wrangled Data.

We now have access to all 12,591,077 rows of the Iowa Liquor Sales data. This contains the original data in an easy tabular format along with the two new latitude and longitude features we’ve created.

ym3nJleIuRjLRD2D6BGXZR8jpDyxOAEVw0LIaiQyPpspyQi5BzqMGBPGS2iMQL9hXOjmcbeIj3nKjcRNC23HieleUWBZu9jB4Bf7q8NIOLzg73txWkR62rzrhtm8BBif6z3aAq1k

By hovering over each column name you can sort the data by that column, view that column’s distribution over all of the data, filter by the data in that column, or rename that column. For example, we can view the distribution of sales by city for our data by clicking on the second-from-the-left icon (distribution) in a0P4nAQ700sTKMDLQdE9gzWkYYuz1t7Yzp7vFI8Yyr0TslTEW4YDvjhCpvmyweO3A8JtbiRyy_eZxFyo9GFSdFG4xPsC4trvMcWZY9HZNGlWqmKtewwb1Oggw2qLNlNmIX17PkOy for the City column.

_Mt1UNQQH_EIEODuOl5ZmlIK9D30SP5yvXZdueLQfrA-H8iuwYq9nEaAUbySzqbBe5f2EARGeKBFmnSIPnto2jxSuX1kxVjdOK7zclCHX_jouZ6l–RlTlm60thqxlP4V1UVeG5o

As expected, the capital of Iowa (Des Moines) with the highest population of any city in the state has the highest total number of sales.

Now, for our analysis we’ll filter all of the data to only include data for the city of Des Moines.

  1. Click on the second-from-the-right icon (filter) in bEKSFC8zINJmBTUDHqpYNo7iYNw5pGxCmropvfBeCABlJnvkKboYb4TsYpblAgsfldnXX_NiMqkX7-NhKQqNLqfK_8F8TFqOPen4shLhHbX0A4Wwjlisfgh0JjjWzoyMs3J3msA9 for the City column.
  2. Select Equals from the dropdown and then enter “Des Moines” and press enter. All of our data is immediately filtered to just Des Moines data.

The data is almost ready for Tableau or another business intelligence tool but the StoreLocation column is troublesome and prevents Tableau from reading the data. So we’ll go ahead and download the data without this column.

  1. Click the VIlonNlK6gJaoVtjByzCqidmoifa0WWgKoAG8MzwKkpVN2YNKuZVgSaxLMAnqlH1CJbkpQr90vFmf4DeQ9D35x1VjrOxu9zsmLEZ5GNPoPx94Gf03ccW-XYFiNORiuggpa7uhtr2 icon on the top right.
  2. Unselect the box next to StoreLocation to remove that column from your selected data.

7WDOyl5yBmVOxSLUs2HAnevkFt-SnJ3-9-nonwrkxqiKaqg-CwM9fuYYIfcfgDsrvTuIuJv752tju1N67X8kkhy3J-dZ1uX6KdQ2gOR4_tqG2UHCyaQ20nYoVl5-mRtdG1RJOSwn

  1. Now click the IxUwZN1KAk7uyYHUSOlFlujOWszAbydyEIWsZIuFsyhi4A0oh0ubUsepPgdgtn8hD5dpE1lUERqJ1v-I_SMhJr4ptqjtmuVr_TQEhpCVRDMSfnbOajTnMWbERHE-nmio3LyVcllS icon on the top right.
  2. Enter “DesMoinesSales” for the file name and then click Selected Columns.
  3. When the file is ready for download click the IxUwZN1KAk7uyYHUSOlFlujOWszAbydyEIWsZIuFsyhi4A0oh0ubUsepPgdgtn8hD5dpE1lUERqJ1v-I_SMhJr4ptqjtmuVr_TQEhpCVRDMSfnbOajTnMWbERHE-nmio3LyVcllS at the bottom of the screen as seen below.

gDVng1vU3JZFxSyauIgvLp9ZdSWWwwfqnWOeNGUArGxrmeYO2sP4_wCxAcEzxEzw4WbFF5SXeVSSpXCX2RbuGJlstc6uykNsvyM0W04m4FD1459HrmtuF5-bQ74d_CKP4jxrCdtK

Import and Graph the Data in Tableau

We can now easily import this data into Tableau and use its powerful visualizations to dive into Des Moines liquor sales by location.

  1. Unzip the “DesMoinesSales.csv.zip” file we just downloaded from PivotBillions.
  2. Open Tableau and click Text under Connect.
  3. Navigate to the folder where the “DesMoinesSales.csv” file is located and Open it.
  4. Then click on Sheet1.
  5. Right click Latitude under Measures and select Convert to Discrete and then Convert to Dimension.
  6. Repeat this for Longitude.
  7. Drag Longitude to Columns and Drag Latitude to Rows as seen below.
  8. Drag Sale Dollars to Size and drag whatever features you want to Detail.

yimcAFKEaKh3DveYlBHXVYnZYEkUtL5wrkZdd907csrN5XVihzDhAen4de4j8L2Mj3LPijFnXH3VdYf0mgGS1siuy1iPaj90K6MzuvT5ri5EW_Fydph2RFMKTWGH2MtlmnvGqpf5

It is clear from this SymbolMap plot that the sales for Des Moines are highest near the city center. However, it is also clear that there are many high volume locations throughout the city. We can now visualize the data however we see fit in Tableau. We could also easily remove our city filter from PivotBillions and load the whole preprocessed data into Tableau, or change our filter(s) to select a different subset or arrangement of the data.

To view and interact with this visualization or download the workbook to Tableau, see my Iowa Liquor Sales Workbook on Tableau Public.