### Steps and Calculations for Apriori

#### Example Dataset

Let's use the following dataset of transactions:

| Transaction ID | Items Purchased               |
|----------------|-------------------------------|
| 1              | Bread, Milk                   |
| 2              | Bread, Diapers, Beer, Eggs    |
| 3              | Milk, Diapers, Beer, Coke     |
| 4              | Bread, Milk, Diapers, Beer    |
| 5              | Bread, Milk, Diapers, Coke    |


#### 1. Apriori Algorithm

##### Steps:
1. **Generate Candidate Itemsets**:
   - Start with single items (1-itemsets).
   - Generate k-itemsets from (k-1)-itemsets.

2. **Prune Infrequent Itemsets**:
   - Remove itemsets that do not meet the minimum support threshold.

3. **Generate Association Rules**:
   - For each frequent itemset, generate rules and calculate confidence.

##### Manual Calculation:

1. **Generate 1-itemsets and Count Support**:
   - Bread: 4
   - Milk: 4
   - Diapers: 4
   - Beer: 3
   - Eggs: 1
   - Coke: 2

2. **Prune Infrequent 1-itemsets (Min Support = 0.6 or 3 transactions)**:
   - Eggs is pruned.

3. **Generate 2-itemsets and Count Support**:
   - {Bread, Milk}: 3
   - {Bread, Diapers}: 3
   - {Bread, Beer}: 2
   - {Milk, Diapers}: 3
   - {Milk, Beer}: 2
   - {Diapers, Beer}: 3
   - {Diapers, Coke}: 2
   - {Milk, Coke}: 2

4. **Prune Infrequent 2-itemsets**:
   - {Bread, Beer}, {Milk, Beer}, {Diapers, Coke}, {Milk, Coke} are pruned.

5. **Generate 3-itemsets and Count Support**:
   - {Bread, Milk, Diapers}: 3
   - {Milk, Diapers, Beer}: 2

6. **Prune Infrequent 3-itemsets**:
   - {Milk, Diapers, Beer} is pruned.

7. **Frequent Itemsets**:
   - {Bread, Milk}, {Bread, Diapers}, {Milk, Diapers}, {Diapers, Beer}, {Bread, Milk, Diapers}

8. **Generate Association Rules**:
   - {Bread, Milk} → {Diapers} with confidence = 3/4 = 0.75

In [9]:
!pip install mlxtend


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [10]:
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

1. **Data Preparation**:
   - The dataset is prepared as a list of lists where each list represents a transaction.

In [11]:
# Define the dataset
transactions = [
    ['Bread', 'Milk'],
    ['Bread', 'Diapers', 'Beer', 'Eggs'],
    ['Milk', 'Diapers', 'Beer', 'Coke'],
    ['Bread', 'Milk', 'Diapers', 'Beer'],
    ['Bread', 'Milk', 'Diapers', 'Coke']
]

2. **Transaction Encoding**:
   - `TransactionEncoder` is used to transform the dataset into a binary format suitable for frequent pattern mining.

In [13]:
# Transform the dataset to a transactional format
te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)
df = pd.DataFrame(te_ary, columns=te.columns_)
df

Unnamed: 0,Beer,Bread,Coke,Diapers,Eggs,Milk
0,False,True,False,False,False,True
1,True,True,False,True,True,False
2,True,False,True,True,False,True
3,True,True,False,True,False,True
4,False,True,True,True,False,True


3. **Apriori Algorithm**:
   - The `apriori` function from `mlxtend.frequent_patterns` is applied to find frequent itemsets with a minimum support of 60%.

In [18]:
# Apply Apriori algorithm
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

In [19]:
# Display the frequent itemsets
print("Frequent Itemsets:")
print(frequent_itemsets)

Frequent Itemsets:
   support          itemsets
0      0.6            (Beer)
1      0.8           (Bread)
2      0.8         (Diapers)
3      0.8            (Milk)
4      0.6   (Beer, Diapers)
5      0.6  (Bread, Diapers)
6      0.6     (Bread, Milk)
7      0.6   (Milk, Diapers)


4. **Association Rules**:
   - The `association_rules` function generates rules from the frequent itemsets with a minimum confidence of 70%.

In [20]:
# Generate the association rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

In [21]:
# Display the association rules
print("\nAssociation Rules:")
rules


Association Rules:


Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,leverage,conviction,zhangs_metric
0,(Beer),(Diapers),0.6,0.8,0.6,1.0,1.25,0.12,inf,0.5
1,(Diapers),(Beer),0.8,0.6,0.6,0.75,1.25,0.12,1.6,1.0
2,(Bread),(Diapers),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8,-0.25
3,(Diapers),(Bread),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8,-0.25
4,(Bread),(Milk),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8,-0.25
5,(Milk),(Bread),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8,-0.25
6,(Milk),(Diapers),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8,-0.25
7,(Diapers),(Milk),0.8,0.8,0.6,0.75,0.9375,-0.04,0.8,-0.25
