In this 5 Minute Analysis we'll preprocess, map, and explore complicated sales data for liquor stores in Iowa. Then we’ll extract the relevant latitude and longitude from a problematic column of the data and discover the city with the most sales. Next we’ll filter the data to that city and prepare the data for easy loading into Business Analysis tools such as Tableau and PowerBI. Finally we’ll use our extracted location data to visualize sales within the selected city.
Dataset: Iowa Liquor Sales
This blog post explores and analyzes the data using Pivot Billions, available freely on docker.
* I recommend allocating 7.5 GB to your docker container for this analysis since this dataset is over 12 Million rows.
The Column Preview tab allows you to see the columns and types of each dataset and modify them as you see fit. You can also view or change which column or columns are set as primary keys and determine how datasets should be joined. When you are done viewing or modifying the data structure, you can import and load the data.
For our dataset, we’ll create two new columns from the StoreLocation column: Latitude and Longitude.
We now have access to all 12,591,077 rows of the Iowa Liquor Sales data. This contains the original data in an easy tabular format along with the two new latitude and longitude features we’ve created.
By hovering over each column name you can sort the data by that column, view that column’s distribution over all of the data, filter by the data in that column, or rename that column. For example, we can view the distribution of sales by city for our data by clicking on the second-from-the-left icon (distribution) in for the City column.
As expected, the capital of Iowa (Des Moines) with the highest population of any city in the state has the highest total number of sales.
Now, for our analysis we’ll filter all of the data to only include data for the city of Des Moines.
The data is almost ready for Tableau or another business intelligence tool but the StoreLocation column is troublesome and prevents Tableau from reading the data. So we’ll go ahead and download the data without this column.
We can now easily import this data into Tableau and use its powerful visualizations to dive into Des Moines liquor sales by location.
It is clear from this SymbolMap plot that the sales for Des Moines are highest near the city center. However, it is also clear that there are many high volume locations throughout the city. We can now visualize the data however we see fit in Tableau. We could also easily remove our city filter from PivotBillions and load the whole preprocessed data into Tableau, or change our filter(s) to select a different subset or arrangement of the data.
To view and interact with this visualization or download the workbook to Tableau, see my Iowa Liquor Sales Workbook on Tableau Public.