.

This article has been excerpted from my book, *Models and Algorithms for Unlabelled Data.*

Next time you visit a nearby grocery store, look around inside the store and the arrangements of various items. You would find shelves with items like milk, eggs, bread, sugar, washing powder, soaps, fruits, vegetables, cookies and various other items neatly stacked. Have you ever wondered what is the logic of this arrangement and how these items are laid out? Why certain products are kept near to each other while others are quite far from each other? Obviously, the arrangement cannot be done in a random manner and there has to be scientific reasoning behind it. Association rules can be the answer for them. There are quite a few algorithms for association rules like apriori, frequent-pattern growth (FP) algorithm, Equivalence class clustering and bottom-up lattice traversal or ECLAT, etc.

In this article, we will study the Equivalence class clustering and bottom-up lattice traversal algorithm.

Imagine we have dataset and we have to come up with a set of rules which can be used for pragmatic business purposes. There could be a number of ways to achieve this.

We will first explore the algorithm in a theoretical manner followed by a more real-world example. ECLAT uses a depth-first search approach. This means that ECLAT performs the search in a vertical fashion throughout the dataset. It starts at the root node, then goes one level deep and continues until it reaches the first terminal note. Let’s say the terminal node is at level X. One start terminal node is reached, the algorithm then takes a step back and reaches level (X-1) and continues till it finds a terminal node again. Let's understand this process by means of a tree diagram as shown in Figure 1.

Figure 1 Tree diagram to understand the process of ECLAT algorithm. It starts with 1 and ends at 16.

ECLAT will take the following steps:

- The algorithm starts at the root node 1.
- It then goes one level deep to root node 2.
- It will then continue one more level deep till it reaches terminal node 11.
- Once it reaches the terminal note 11, it then takes a step back and goes to node 5.
- The algorithm then searches if there is any node available which can be used. At node 5 we can see that there is no such node available.
- Hence the algorithm again takes a step back and it reaches node 2.
- At node 2, the algorithm explores again. It finds that it is possible to go to note 6.
- So, the algorithm goes to node 6 and starts exploring again till it reaches the terminal node 12.
- This process continues until all the combinations have been exhausted.

Obviously, the speed of computation depends on the total number of distinct items present in the data set. This is because the number of distinct items define the width of the tree. The items purchased in each of the transactions would define the relationship between each node.

During execution time of ECLAT, each item (either individually or in a pair) is analyzed. Let us use an example to understand ECLAT better as shown in Table 1.

Invoice Number |
Milk |
Eggs |
Bread |
Cheese |

1001 |
1 |
1 |
1 |
0 |

1002 |
0 |
1 |
1 |
1 |

1003 |
1 |
1 |
1 |
0 |

1004 |
0 |
1 |
0 |
1 |

1005 |
0 |
1 |
1 |
0 |

Table 1 The data set we are going to use to understand ECLAT. The first invoice (1001) has milk, eggs, and bread but no cheese.

ECLAT will undergo the following steps to analyze the dataset:

- In the first run ECLAT will find the invoice numbers for all single items. Or in other words, it would find the invoice numbers for all the items individually. It can be shown in the Table 2 below, wherein milk is present in invoice number 1001 and 1003 while eggs are present in all the invoices.

Item |
Invoice Numbers |

Milk |
1001,1003 |

Eggs |
1001, 1002, 1003, 1004, 1005 |

Bread |
1001, 1002, 1003, 1005 |

Cheese |
1002, 1004 |

Table 2 Respective invoices in which each item is present. Milk is present in 1001 and 1003 while eggs are present in five invoices.

- Now in the next step, all the two items dataset are explored as shown below in Table 3. For example, milk and eggs are present in invoice number 1001 and 1003, while milk and cheese are not present in any invoice.

Item |
Invoice Numbers |

Milk, Eggs |
1001, 1003 |

Milk, Bread |
1001, 1003 |

Milk, Cheese |
- |

Eggs, Bread |
1001, 1002, 1003, 1005 |

Eggs, Cheese |
1002, 1004 |

Bread, Cheese |
1002 |

Table 3 Two item data sets are explored now. Milk and eggs are present in invoice number 1001 and 1003 while there is no invoice for milk and cheese.

- In the next step, all the three item datasets are explored as shown in the table 4.

Item |
Invoice Numbers |

Milk, Eggs, Bread |
1001, 1003 |

Eggs, Bread, Cheese |
1002 |

Table 4 Three item datasets are analyzed in this step. We have two combinations only.

- There are no invoices present in our data set which contain all four items.
- Now depending on the threshold, we set for the value of support count, we can choose the rules. So, if we want that minimum number of transactions in which the rule should be true is equal to three then only one rule qualifies which is {Eggs, Bread}. If we decide the threshold for the minimum number of transactions as two, then rules like {Milk, Eggs, Bread}, {Milk, Eggs}, {Milk, Bread}, {Eggs, Bread} and {Eggs, Cheese} qualify as the rules.

This is how an ECLAT works on a dataset to generate the association rules which are quite handy in the pragmatic real world. To explore the other algorithms and Python implementation, you can refer to the book here. You can also take 35% off purchase by entering **fccverdan** into the discount code box at checkout at manning.com.

Posted 9 November 2021

© 2021 TechTarget, Inc. Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

- Book: Applied Stochastic Processes
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- How to Automatically Determine the Number of Clusters in your Data
- New Machine Learning Cheat Sheet | Old one
- Confidence Intervals Without Pain - With Resampling
- Advanced Machine Learning with Basic Excel
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Fast Combinatorial Feature Selection

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives:** 2008-2014 |
2015-2016 |
2017-2019 |
Book 1 |
Book 2 |
More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central