Home » Uncategorized

5 Minute Analysis: Getting the Whole Picture of Retail Sales

In this 5 Minute Analysis we’ll preprocess, join, and explore disjoint sales data for retail locations. Then we’ll dive into the relation between store type, fuel prices, and weekly sales as well as prepare the data for easy loading into Business Analysis tools such as Tableau and PowerBI. Finally, we’ll explore and analyze our preprocessed and joined data in Tableau and use its powerful visualizations to draw additional insights.

Dataset: Retail Data Analytics

This blog post explores and analyzes the data using PivotBillions, available freely on docker.

Docker Image

Steps

Load the Data and View its Structure

  1. Download the dataset from Kaggle.
  2. Unzip your downloaded data.
  3. Access the Pivot Billions URL for your machine.
  4. Click the Plus (+) icon on the bottom left hand side of the window.
  5. Select Drag & Drop.

z4vyG-9ApXWyty3tucqjRnoSrbxcpBxo2ptsHb9IEgKNS5YjXK0zlERstPAAxyGElMdKOH6ls_p6ezpD9mA4-7kn2Vo0HE95Rp8CivqXMHlelJPgPD7175-gLjMAXRFdwsBx1vaL

  1. Drag your downloaded “Features data set.csv”, “sales data-set.csv”, and 
    “stores data-set.csv” files to the Drag & Drop box in Pivot Billions.
  2. Click the dropdown arrow EdYZOO5N4Lcq8lIEuFae00yjsO-ShLS4cZaVqOGxuDVMUH7fVuXHDeIp8nAASN6D0j8e45xICDGnw5J10BoPH8m0gubWHJlZ_XFWCVeiA5t6dzL4eLk4V9VwjF2jKOgGeD7YNr8F to the right of the “sales data-set.csv” file in Pivot Billions to view the schema of the data and see a sample.

YfKl1DVZzmyF-UE7UwQssTV1LO9RF5rP_Ep_xJGGiwFy4dYwemVLjHgUyuQlhg4nPFmOZmXzckmdbqW1HVj_Ao_qr0yu7oJHchqgt1Sm8a5lt5XYDbglYHfVL0lJUebRZ_YRdiKF

  1. Then select the left checkbox (Main) next to that file and the right checkbox (Join) next to the “Features data set.csv” file and next to the “stores data-set.csv” file.
  2. Click Column Preview at the top of the screen.

aRffu3I0GRQv6KWKbPSym3PqkUnJjvJv9qkUZuMm_kAb6EDYdx_nxLmDulIR4wY8qcSDgy8RiMhHMdSW0Px0rLSfGO3HeGuWI5-8HDaCrJh3FBT9rtMpHISN5Q7oxAinVG4_YFVk

You can now see the columns and types of each dataset and modify them as you see fit. You can also view or change which column or columns are set as primary keys and determine how the datasets should be joined. When you are done viewing or modifying the data structure, you can import and load the data.

  1. Select the 0iqWeXJn0QaNTI17s57cARQC1c0C7SozVAZSIsOuBY6JDzvs0YqLUFPJtbfI4hcY1gh4raik1MxG3aTMGQ_lf3yjv60fKAIncaIh-2QuyGYtZ86PqFwmbggwlCoF5yB5Oi6STUQ- symbol next to the Store field under the Main table. This sets the Store field as the join column. You can also see that the Store field is in both of the other datasets we plan to join.
  2. Unselect the 0iqWeXJn0QaNTI17s57cARQC1c0C7SozVAZSIsOuBY6JDzvs0YqLUFPJtbfI4hcY1gh4raik1MxG3aTMGQ_lf3yjv60fKAIncaIh-2QuyGYtZ86PqFwmbggwlCoF5yB5Oi6STUQ- symbol next to the Date field under the Main table.
  3. Click Import at the top of the screen.

KxjwRaMKpS7oZZ5SSE5KdaVpua9lksAg3-LS6ivEPkYRfEW5iERPVRHe_CliDL9-8cMXqLLzaTdPxzEVu-tegn9x-wuzJmPsTQbDES-dnmi7WckOKjiezn9Bf3phqnQvB4cRYEMw

Explore and Filter the Joined Data.

We now have access to all 421,569 rows of the joined retail sales data. This contains the data and features from each of the three separate data files from our original dataset.

3db0HjaPReYsbdMOmrZvxaCulZYGt4wD2wbi2PelKFbUgAi5okCaT7dgnZcxHhe-ZdsK67vqRDICVkgpbjHR4dVK7-6uowiBBh7hktIhwdRq_1RpnOqA6smf1cKW-_P6kkI0iyoH

By hovering over each column name you can sort the data by that column, view that column’s distribution over all of the data, filter by the data in that column, or rename that column. For example, we can view the distribution of the Temperature during the sales period in our data by clicking on the second-from-the-left icon (distribution) in Jk6RGl2qi5-oNFE1dpeyIX9eA3XXBT3SgEiNsE6Vgh8xt30BcgK9kJzrZLcu0pYCX2_KyZEUjRA6RmW8CXjAmcED_Tud9sUzxn2kg9dxAz6rPjBWKDn4gxq2C2lRFpqoJ8xKTHs2 for the Temperature column.

 a0PZzwj0Qh1RqDspavoZ4P2YWYgBB0urL1Nwb_gzRv3FaK2IIIHdOX1uiWZv-cHhqRJC7nwRHT9bdzfu_adY4M71oZ-PUhzSPw8g1A7WecJhpVpByjd5zpLVN5XF7UsUlc42BQlJ

As expected it has a relatively normal distribution.

Now, for our analysis we’ll filter all of the data to get rid of Holiday sales since they are less indicative of general sales during the year.

  1. Click on the second-from-the-right icon (filter) in GI318K-fcYdZS7rJRy8nkXVEBWMU1RPt0UMj62gwQu_mBspYeZR9HYjXiE65hkbct-vafvqQDVapD1ehPvKNwHoVK_zqwvfDWYj5tjb0vpt6ZwLJEsUg8unMtwZQ3o3au1eCKVEE for the IsHoliday column.
  2. Select Equals from the dropdown and then enter “FALSE” and press enter

pzBH2yaAqJ818IQvD0awYy58kRTr01fHfwFS2meKvwKFzlgXgApmPflSU9um4q3W6vUsZSOp4_fNsiWvMrLHimNyjamlH20QuEHEhnrroDKRG0bNhAu7FWzFkKugEzER_az4sjug

All of our data is immediately filtered to data occurring on non-Holiday days.

zsSLquaYgCFSCDfhjZlD_yk1sG9dHs2waAp4HEXSKCSwmD2Ym1PENlXSYAwv5DeO09sLHA4m55aL0Do93iQkA7XAlU4ZfXFadw7XohyDKwU8nIfPlbInjnPDhZzv3Uy9VIxEdMup

Explore the Effect of Store Type and Fuel Price on Weekly Sales

We’ll now do some basic analysis to see what relationship the fuel price has on each store type’s weekly sales.

  1. Click the Pivot icon kc7f8N7bfFlsbkAT9Jorllb6LYvtsD-RFy00p-szlEEjMdg4gHhBVoWBVCdJ86WA3r_e-fKwoXAfrLetbj7twwxiFgF7XNfMVdK0b7jWH_QReIcloIzM3fo9YUsLHqH9O5ZNFF8Q in the the top right of your data table.
  2. Click the Plus (+) icon under Dimensions and select the “Type” column.
  3. Click the Plus (+) icon again and select the “Fuel_Price” column.
  4. Click the Plus (+) icon under Values and select the “Weekly_Sales” column.
  5. Click View to pivot your data.

Pivot Billions now quickly reorganizes your data by store type and fuel price. It also provides counts, sums, and statistics on the corresponding weekly sales. You can also sort by a column or filter the data. The data is sorted by Count by default.

swJjzI3qpHFqkY-5qWSU0GjE5OmDJrzbmq4ojWi-ELQCxYh33arSIl1fTU0wiDghzIHl5UqryFzkBt99BDuIuTl-T4MwlMAK6KGfoW9aBA9lgWFhfT_pvbbw_Ok8V3OJDIP3SEYv

You can see that Pivot Billions quickly organizes the data by the 2,086 unique combinations of the Type and Fuel Price Columns. We’ll now interactively view the data.

  1. Click the Switch View Type yF4beoVVnmIKLL_DDwesSGDF8bFSw5_P9Jg32G4LAx2ardLl_T-Fjj-nnPqVpNeBhS2aCiUyllHN1dSyF2E6F6P6ALOGR8f2dEj8NFlcOYVvSa33PnL8VRBy67HOFhNNIh7iiYfA icon in the top right of the pivot widget and select Pivot View.
  2. Drag the Type box to below the drop down selection box.
  3. Change the topleft dropbox from “Table” to “Horizontal Bar Chart”.
  4. Click the lower drop down selection box and change “Count” to “Average”.

iTk_kua2sOO8E7bOR2z5D3-IWh-5-rXT7oAJYvFdjmccOC2Iitn3Z2Of-cRoMa9tpN-AQ4OUTf0OqBn0OvWx3_z5gw2iq6lGXk-5ti0B3ysrE4jOVjQl33oAqqEJ8KuDMpbKtWL-

We can immediately see that Store Type A has by far the most sales of each of the store types.

Now let’s view the relationship between weekly sales by store type and fuel price.

  1. Drag the Fuel_Price box to the right of the drop down selection box as shown below.
  2. Change the topleft dropbox from “Horizontal Bar Chart”  to “Bar Chart”.
  3. Select the arrow next to Type and unselect the A and C Types.

YUttlobTMXIMVlp_TojQEpBNK4eJs2qojHI-LvWbzb-7NJp8firWddyhUyIOdpstg98viqV759XPHdznHAw3wv2s2bsuG1iLHQIULwMLnLw14tXQENsvOI4C6yllIzq3qh51vvQt

You can see an increase in sales during some periods of low fuel prices for Store Type B. However, it is unclear whether this increase is solely due to low fuel prices since the increase in sales is not present for all periods of low fuel prices.

This warrants further exploration either in Pivot Billions or another tool. In Pivot Billions we could simply add another dimension or filter to explore their effect on our results. If we want to use another tool such as Tableau or PowerBI we simply click the download button on the top right of the pivot widget, enter our filename, and select Download.

VllRToDO0PQuaEHQBpTqljpoZ-5zXB4fbD-W6P0_b0AD8azlBgd5TxCUqOte9PNKvB4jVHCMeKIIX4mfaf4NwcH7glQMLM5yHYTPjaZoJn3UtVBhTrfDmnzEttCTK0rm5bo3OMe2

Export the Data into Tableau for Additional Exploration

We can also load all or some of the data directly into Tableau, without the need to rearrange it. For efficiency we’ll look at all of the non-holiday data for the first ten departments across all stores and types.

  1. In the main table, click on the second-from-the-right icon (filter) in GI318K-fcYdZS7rJRy8nkXVEBWMU1RPt0UMj62gwQu_mBspYeZR9HYjXiE65hkbct-vafvqQDVapD1ehPvKNwHoVK_zqwvfDWYj5tjb0vpt6ZwLJEsUg8unMtwZQ3o3au1eCKVEE for the Dept column.
  2. Select Less Than or Equal To from the dropdown and then enter “10” and press enter

bEWBxh-bzaytwOB85ncBUJHCXGTabVa_JWjqtrem1uFN-aM7LBP9lKUdQyfJB0ZREYtxp6wQZnYenOhoW-sPCAikm0vaoQhccz_pMY2RXnXFEFqAxFDq00ODnjVEOMoYkLCJw1wD

  1. Now click the An4AvvOveI3IZjPpHKePBScWhTEANpA0b7ExcuTeccJXPqsrdnf7VZu_9FR28apNxs4_E0TFAx4IsaeYigICsBIHXQP8fNguHtaMTQ23f7WSp1Gpl98P_BDt9i9EY3CYmE38Kj5Y icon on the top right.
  2. Enter “RetailSales_Dept1to10_NonHoliday” for the file name and then click All Columns.
  3. When the file is ready for download click the An4AvvOveI3IZjPpHKePBScWhTEANpA0b7ExcuTeccJXPqsrdnf7VZu_9FR28apNxs4_E0TFAx4IsaeYigICsBIHXQP8fNguHtaMTQ23f7WSp1Gpl98P_BDt9i9EY3CYmE38Kj5Y at the bottom of the screen as seen below.

EKtmuJSbjM54FNLqzK04I3SFyi19xytuWmKx3QRPP0K-HRz66r8wWaqTG4VLzhPU4beMjPc_0gRi5nQEJi4zW4BLF297VoLAPFcbcJvEIBhKm04zcXZkjqo09OnoqV8xry_7IO3n

We can now easily import this data into Tableau and use its powerful visualizations to dive into the data.

  1. Unzip the “RetailSales_Dept1to10_NonHoliday.csv.zip” file we just downloaded from PivotBillions.
  2. Open Tableau and click Text under Connect.
  3. Navigate to the folder where the “RetailSales_Dept1to10_NonHoliday.csv” file is located and Open it.
  4. Then click on Sheet1.
  5. Right click Dept under Measures and select Convert to Dimension.
  6. Drag Dept to Rows and drag Type to Rows as seen below.
  7. Drag Type to Color.
  8. Drag Weekly Sales to Columns and change its Measure to Average.
  9. Click on one of the “Null” values for Type and then click Exclude.

Gthv9Q-RE6xU9QyiCe-kbXahYw2WLB8kjSkrctNHP73H2iaiO-XCI6bzYx_nP8EOnBpmgzYNzZC5d08AM5Xn2AF0mgz6kJs47HIkAP8qJcTmDZBvR8OKLDtti4ivth5Yr1bMcVFI

Visualizing the data this way makes it clear that Department 6 has the least average weekly sales of the first 10 departments for each store type. Also, Type C Stores have heavily concentrated sales in Departments 1, 2, 4, and 8 with the other departments making up an insignificant percentage of sales. This could warrant dropping some of those departments for Type C Stores or Department 6 for all store types if those departments don’t have secondary benefits such as drawing customers to the stores.

To view and interact with this visualization or download the workbook to Tableau, see my Getting the Whole Picture of Retail Sales Workbook on Tableau Public.