# Eclat

The Eclat (Equivalence Class Clustering and bottom-up Lattice Traversal) algorithm is a popular data mining algorithm used to find frequent itemsets in a dataset. It is especially efficient for discovering frequent patterns in large datasets. Unlike the Apriori algorithm, which uses a breadth-first search and horizontal data format, Eclat uses a depth-first search and vertical data format.

### Key Concepts

1. **Itemset**: A collection of one or more items.
2. **Support**: The number of transactions in the dataset that contain a particular itemset.
3. **Transaction ID (TID) Set**: A set of transaction IDs where a particular itemset appears.

### Steps of the Eclat Algorithm

1. **Convert Dataset to Vertical Format**:
   - Represent the dataset in a vertical format where each item is associated with a list of transaction IDs (TIDs) in which it appears.

2. **Recursive Depth-First Search**:
   - Use a recursive depth-first search to explore itemsets.
   - Intersect TID sets to find the support of larger itemsets.

3. **Prune Non-Frequent Itemsets**:
   - Only retain itemsets that meet a minimum support threshold.
   - Prune itemsets that do not meet this threshold.

4. **Generate Frequent Itemsets**:
   - Use the intersection of TID sets to generate frequent itemsets.

### Detailed Steps

**1. Converting to Vertical Format**:

For a dataset of transactions:

| Transaction ID | Items      |
|----------------|------------|
| 1              | A, B       |
| 2              | B, C       |
| 3              | A, C       |
| 4              | A, B, C    |
| 5              | A, B       |

Convert to vertical format:

- {A}: {1, 3, 4, 5}
- {B}: {1, 2, 4, 5}
- {C}: {2, 3, 4}

**2. Recursive Depth-First Search**:

Start with single items and their TID sets:

- {A} → {1, 3, 4, 5}
- {B} → {1, 2, 4, 5}
- {C} → {2, 3, 4}

Combine items and intersect their TID sets to find larger itemsets:

- {A, B} → {1, 4, 5} (intersection of {A} and {B})
- {A, C} → {3, 4} (intersection of {A} and {C})
- {B, C} → {2, 4} (intersection of {B} and {C})

Combine further if needed:

- {A, B, C} → {4} (intersection of {A, B} and {C})

**3. Prune Non-Frequent Itemsets**:

Assuming a minimum support threshold of 2 transactions (40% support):

- {A} has support 4
- {B} has support 4
- {C} has support 3
- {A, B} has support 3
- {A, C} has support 2
- {B, C} has support 2
- {A, B, C} has support 1 -> Pruned

**4. Generate Frequent Itemsets**:

Frequent itemsets:

- {A}, {B}, {C}, {A, B}, {A, C}, {B, C}

### Advantages of Eclat

- **Efficiency**: Eclat can be more efficient than Apriori for datasets with a large number of transactions and dense datasets.
- **Memory Usage**: Using a vertical format, Eclat can be more memory-efficient, as it only stores the TID sets.

### Applications of Eclat Algorithm

- **Market Basket Analysis**: Identifying sets of items that are frequently purchased together.
- **Web Usage Mining**: Discovering patterns in user navigation on websites.
- **Bioinformatics**: Finding frequent patterns in biological datasets, such as gene expression data.

### Comparison with Apriori

- **Search Strategy**: Eclat uses depth-first search, while Apriori uses breadth-first search.
- **Data Format**: Eclat works with vertical data format (item to TID sets), while Apriori works with horizontal data format (transactions to items).
- **Performance**: Eclat can be faster and more memory-efficient, especially for dense datasets.

The Eclat algorithm is a powerful tool for discovering frequent itemsets in large datasets, providing valuable insights and aiding in various applications such as market basket analysis, web usage mining, and bioinformatics.

## Importing the libraries

In [1]:
!pip install apyori

Defaulting to user installation because normal site-packages is not writeable


DEPRECATION: Loading egg at c:\anaconda3\lib\site-packages\asgiref-3.7.2-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330
DEPRECATION: Loading egg at c:\anaconda3\lib\site-packages\django_simpleui-2023.11.16-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330
DEPRECATION: Loading egg at c:\anaconda3\lib\site-packages\vboxapi-1.0-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330


In [2]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data Preprocessing

In [3]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
transactions = []
for i in range(0, 7501):
  transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

## Training the Eclat model on the dataset

In [4]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [5]:
results = list(rules)

In [6]:
results

[RelationRecord(items=frozenset({'light cream', 'chicken'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

### Putting the results well organised into a Pandas DataFrame

In [7]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    return list(zip(lhs, rhs, supports))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Product 1', 'Product 2', 'Support'])

### Displaying the results sorted by descending supports

In [8]:
resultsinDataFrame.nlargest(n = 10, columns = 'Support')

Unnamed: 0,Product 1,Product 2,Support
4,herb & pepper,ground beef,0.015998
7,whole wheat pasta,olive oil,0.007999
2,pasta,escalope,0.005866
1,mushroom cream sauce,escalope,0.005733
5,tomato sauce,ground beef,0.005333
8,pasta,shrimp,0.005066
0,light cream,chicken,0.004533
3,fromage blanc,honey,0.003333
6,light cream,olive oil,0.0032
