# Apriori

The Apriori algorithm is a classic algorithm in the field of data mining and machine learning, used for mining frequent itemsets and discovering association rules in large datasets. It was introduced by Rakesh Agrawal and Ramakrishnan Srikant in 1994. Here’s a detailed explanation of the Apriori algorithm:

### Objectives of Apriori

- **Frequent Itemsets**: Identify sets of items that appear frequently together in a dataset.
- **Association Rules**: Generate rules that predict the occurrence of an item based on the occurrences of other items.

### Key Concepts

1. **Support**: The proportion of transactions in the dataset that contain a particular itemset.

   \[
   \begin{cases} 
   \text{Support}(A) = \frac{\text{Number of transactions containing A}}{\text{Total number of transactions}}
   \end{cases}
   \]

2. **Confidence**: The likelihood that an item B is also present in transactions that contain item A.

   \[
   \begin{cases} 
   \text{Confidence}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A)}
   \end{cases}
   \]


3. **Lift**: The ratio of the observed support to that expected if A and B were independent.

   \[
   \begin{cases} 
   \text{Lift}(A \rightarrow B) = \frac{\text{Support}(A \cup B)}{\text{Support}(A) \times \text{Support}(B)}
   \end{cases}
   \]

### The Apriori Algorithm Steps

1. **Generate Candidate Itemsets**:
   - Start with single-item itemsets (1-itemsets).
   - Use these to generate larger itemsets (k-itemsets) by combining them.

2. **Prune Non-Frequent Itemsets**:
   - Only keep those itemsets that meet a minimum support threshold.
   - Discard itemsets that do not meet this threshold.

3. **Generate Association Rules**:
   - From the frequent itemsets, generate rules that meet a minimum confidence threshold.
   - Evaluate these rules using metrics like support, confidence, and lift.

### Detailed Steps

**1. Generating Candidate Itemsets:**

   - **Initialization**: Start with all items in the dataset as 1-itemsets.
   - **Iteration**: Generate k-itemsets from (k-1)-itemsets by combining them if they share (k-1) items.

**2. Pruning:**

   - After generating k-itemsets, count their occurrences in the dataset.
   - Prune itemsets that do not meet the minimum support threshold.

**3. Generating Association Rules:**

   - For each frequent itemset, generate all possible rules.
   - For a frequent itemset {A, B, C}, possible rules are:
     - A → B
     - A → C
     - B → A
     - B → C
     - C → A
     - C → B
   - Calculate confidence for each rule and prune rules that do not meet the minimum confidence threshold.

### Example

Suppose we have a dataset of transactions:

| Transaction ID | Items      |
|----------------|------------|
| 1              | A, B       |
| 2              | B, C       |
| 3              | A, C       |
| 4              | A, B, C    |
| 5              | A, B       |

**Step 1: Generate Candidate Itemsets**

- 1-itemsets: {A}, {B}, {C}
- 2-itemsets: {A, B}, {A, C}, {B, C}
- 3-itemset: {A, B, C}

**Step 2: Prune Non-Frequent Itemsets**

Assume the minimum support threshold is 60%.

- {A} has support 80% (4/5)
- {B} has support 80% (4/5)
- {C} has support 60% (3/5)
- {A, B} has support 60% (3/5)
- {A, C} has support 40% (2/5) -> Pruned
- {B, C} has support 40% (2/5) -> Pruned
- {A, B, C} has support 20% (1/5) -> Pruned

**Step 3: Generate Association Rules**

From the frequent itemsets:

- Rule: {A} → {B} with confidence = support({A, B}) / support({A}) = 60% / 80% = 75%
- Rule: {B} → {A} with confidence = 75%

**Step 4: Evaluate and Apply Rules**

These rules can be used for decision-making, such as placing items A and B closer together in a store.

### Applications of Apriori

- **Market Basket Analysis**: Identifying items frequently bought together.
- **Web Usage Mining**: Analyzing clickstream data for patterns.
- **Fraud Detection**: Identifying patterns indicative of fraudulent activities.
- **Bioinformatics**: Discovering associations between genes and diseases.

The Apriori algorithm is fundamental in association rule learning, providing insights into the relationships between items in large datasets, and aiding in decision-making processes across various domains.

## Importing the libraries

In [17]:
!pip install apyori

Defaulting to user installation because normal site-packages is not writeable


DEPRECATION: Loading egg at c:\anaconda3\lib\site-packages\asgiref-3.7.2-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330
DEPRECATION: Loading egg at c:\anaconda3\lib\site-packages\django_simpleui-2023.11.16-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330
DEPRECATION: Loading egg at c:\anaconda3\lib\site-packages\vboxapi-1.0-py3.11.egg is deprecated. pip 24.3 will enforce this behaviour change. A possible replacement is to use pip for package installation.. Discussion can be found at https://github.com/pypa/pip/issues/12330


In [18]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data Preprocessing

In [20]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
transactions = []
for i in range(0, 7501):
  transactions.append([str(dataset.values[i,j]) for j in range(0, 20)])

## Training the Apriori model on the dataset

In [21]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [22]:
results = list(rules)

In [23]:
results

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'escalope', 'mushroom cream sauce'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'escalope', 'pasta'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

### Putting the results well organised into a Pandas DataFrame

In [24]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

### Displaying the results non sorted

In [25]:
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


### Displaying the results sorted by descending lifts

In [26]:
resultsinDataFrame.nlargest(n = 10, columns = 'Lift')

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471
