In today's post, we dive into understanding Association Rules for Market Basket Analysis and discuss three numeric measures that should be considered before deciding to act on / make a business decision based on associations that have been observed in the data: (1) Support (2) Confidence and (3) Lift.
Association rules are typically written in the format:
Left hand side Implies Right hand side
The left hand side is referred to as the Antecedent and the right hand side is the Consequent. The Antecedent means a thing that logically precedes another while a Consequent means a thing that follows as a result. For example, in the association rule:
{Butter, Eggs} Implies {Bread}
Butter and eggs are the Antecedent while Bread is the Consequent. What this rule means that if you were to pick a shopping cart at random and find butter and eggs in there, there is a chance that you are also likely to find bread.
The numeric measures mentioned above (Support, Confidence and Lift) are used to measure thechance that this rule holds true. So let's get straight into understanding these rules in detail. In order to do this, let's take the following sample of market baskets:
Let us now investigate the following rules:
A implies B
A implies C
C implies D
B & C imply E
Support
Support refers to the percentage of baskets where the rule was true, i.e. where both the left side and the right side products were present.
Let us review our market baskets and look for support for the rules that we want to investigate:
A implies B: Support = 1 / 5 = 0.2
A implies C: Support = 2 / 5 = 0.4
C implies D: Support = 2 / 5 = 0.4
B & C imply E: Support = 1 / 5 = 0.2
Since we have a total five market baskets, the denominator always equals 5.
In the first rule, A implies B, we have only one basket where A and B are both present. Therefore, the support for this rule is 1 / 5.
Similarly for A implies C, since there are 2 baskets that contain A and C, the support for this rule is 2 / 5.
For C implies D, there are 2 baskets where C and D are both present so support for this rule is also 2 / 5.
And finally, there is only 1 basket that contains B, C and E so support for the rule B & C imply E is 1 / 5.
Confidence
Confidence measures what percentage of baskets that contain the product on the left hand side also contain the product on the right hand side.
Let us review our market baskets and look for confidence in the rules that we want to investigate:
A implies B: Confidence = 1 / 3 = 0.33
A implies C: Confidence = 2 / 3 = 0.67
C implies D: Confidence = 2 / 4 = 0.5
B & C imply E: Confidence = 1 / 3 = 0.33
In our first rule, we have 3 baskets that contain A. Of these 3 baskets, only 1 basket also contains B. Therefore, the confidence in this rule is 1 / 3.
In our second rule, we have 3 baskets that contain A. Of these 3 baskets, 2 baskets also contain C. Therefore, the confidence in this rule is 2 / 3.
In our third rule, we have 4 baskets that contain C. Of these 4 baskets, 2 baskets also contain D. Therefore, the confidence in this rule is 2 / 4.
In our fourth rule, we have 3 baskets that contain B & C. Of these 3 baskets, only 1 basket also contains E. Therefore, the confidence in this rule is 1 / 3.
Lift
Lift measures how much more frequently the product on the left hand side is found with the product on the right hand side than without the product on the right hand side.
Let us review our market baskets and look for lift in the rules that we want to investigate:
A implies B: Lift = 1 / 2 = 0.5
A implies C: Lift = 2 / 1 = 2
C implies D: Lift = 2 / 2 = 1
B & C imply E: Lift = 1 / 2 = 0.5
In our first rule, we have 1 basket that contains A and B. We have 2 baskets that contain A but do not contain B. Therefore, the lift from this rule is 1 / 2.
In our second rule, we have 2 baskets that contain A and C. We also have 1 basket contains A but does not contain C. Therefore, the lift from this rule is 2 / 1.
In our third rule, we have 2 baskets that contain C and D. We also have 2 baskets that contain C but not D. Therefore, the lift from this rule is 2 / 2.
In our fourth rule, we have 1 basket that contains B, C and E. We also have 2 baskets that contain B & C but not E. Therefore, the lift from this rule is 1 / 2.
Other numeric measures
Other numeric measures that are used to measure the strength of association rules include:
* All confidence
* Collective strength
* Conviction
* Leverage
A detailed discussion of these and other measures can be found here.
User defined significance levels
Association rules in order to be used need to satisfy user defined significance levels. There are no standard thresholds that need to be met; all thresholds are user defined. Rules are usually formed when:
1) User defined significance level for support is met; and
2) User defined significance level for confidence is met.
The Apriori algorithm is particularly useful in identifying these measures; an example is provided below:
The web graph node in SPSS is very useful in getting a visual representation of the relationships; an example is provided below:
Comment
@ Dr. Theophano Mitsa: thank you for your feedback. It was nice meeting you yesterday. I look forward to meeting you at the next meetup and learning more about Temporal Data Mining.
@ Pete Mancini: thank you for your feedback.
Nice post. I also enjoyed your talk yesterday at the "Data Scientist" meetup.
Great post. To the point and a useful analysis for a variety of circumstances!
Thank you Djoni. The graph shows items that are bought together. The stronger lines (red color) depict deeper or more frequent relationships where as the lighter lines depict those products that are less frequently bought together. Hope this helps.
Excellent!
- how do we interpret the graph?
© 2019 Data Science Central ® Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
You need to be a member of Data Science Central to add comments!
Join Data Science Central