What is an easy way to find that needle in the haystack? Why not use a control chart?Using a control chart on a data set is an efficient way to identify data sets that are out of control. A control chart is a statistical tool that is used to chart data points over time and zeros in on points that are outside of the normal range (outliers). The main components of a control chart are:
Average Line –The average value of the data set
Upper Control Limit – The calculated value of the maximum in-control value
Lower Control Limit - The calculated value of the minimum in-control value
There are many uses for control charts. For this article I will focus on monitoring data quality. Imagine you are in charge of monitoring data quality. Your team has over 10,000 metrics you measure every week. Your team usually identifies 100 metrics a week that require further research based on the metric measurement results. Is it feasible to look at every metric every week to find those 100 outliers? Besides eye fatigue, human error, and introducing the risk of missing an outlier, there is a significant cost associated with manually combing through every metric every week. The solution is to build a control chart function to identify which metrics you should take a closer look at.
There are some basics to understand before you dive into control charts. The first question to answer is “How many data points do I need to build a control chart”? The initial answer is that the number of data points needed in a control chart varies. For my control charts, I use a minimum of 14 points. You will have to decide how many points works best for your data set. If you want to include seasonality, you will use amuch larger set. “When should you recalculate the control limits”? This too varies based on your needs with your data. I recalculate the limits and the average line every week, using a rolling 14 data points. I also make sure I do NOT use the current week in my calculations. If you want a deeper dive on control charts, I recommend reading Donald Wheeler’s “Understanding Variation: The Key to Managing Chaos”. Wheeler’s book can help you decide on what parameters would work best with your specific data you are trying to build a control chart on.
Now let’s dive into the how to!
I will assume that you already have knowledge on how to load your data source into Tableau, how to create a basic worksheet, and how to create calculated fields, etc.
1. Create a Calculated Field called “Avg Line”
-Use this function: WINDOW_AVG(SUM([your count measure field]))
2. Create a Calculated Field called “UCL”
-Use function: [Avg Line]+2.66*WINDOW_STDEVP(SUM([your count measure field]) )
3. Create a Calculated Field called “LCL”
-Use function: [Avg Line]-2.66*WINDOW_STDEVP(SUM([your count measure field]) )
4. Drag your newly created Measures onto the marks window (If you haven’t yet, create your worksheet with your data set (line chart with date/time as your column).
5. In your chart that has your line marks, right-click on Y-axis and select “Add Reference Line”
6. Add the Avg Line and customize your label and how you want the line to be displayed
7. Add another reference line like before, this time you will select band and put in the previously made Calculated fields of UCL and LCL
Now, you should have an area on your chart with the average line, upper control limit and lower control limit, as you can see from my sample, I chose to have a fill in the band between the upper and lower control limit. This is only a preference.
If you want to have a dashboard that only shows the “100” weekly metrics that have an outlier, you will need to do a few more steps.
8. Create another Calculated Field called “KPI”
-IF SUM([Record Count]) > [UCL] OR SUM([your count measure field]) < [LCL]
THEN "Out of Control"
ELSE "In Control"
END
9. Now you can add “KPI” to your filter and choose “Out of Control” to only show the "Out of Control" metrics.
Again, assuming you know how to use Tableau, you can create a nice dashboard. Create a data worksheet using the filter of KPI. Create a chart worksheet (you created this in above steps). Create a dashboard that has the list on top and the chart on the bottom. Add an Action on your dashboard so that whatever row is selected on top will show the control chart below.
It is much easier to call attention of outliers/issues when you have math and a great visual on your side! This article covered a small percentage of the control chart universe. I will discuss in a future article when you should focus on a moving range control chart and ignore the standard chart containing constant growth or a constant decline.
Happy mining,
Eric Mayberry
Comment
Thanks for the comment Madison! The control chart method outlined above is a high level, easy to do control chart. I do have a more complicated, detailed version for advanced users if you are interested. The more advanced method contains color markings for trends and shifts. Let me know if you would like the code to do that. I like your post btw!
© 2017 Data Science Central Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
You need to be a member of Data Science Central to add comments!
Join Data Science Central