<a href="https://www.kaggle.com/code/tusharaggarwal27/association-rule-mining-apriori-algorithm?scriptVersionId=113332337" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Association Rule Mining (Apriori Algorithm)


**Importing the required libraries**


In [1]:
!pip install apyori

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Collecting apyori
  Downloading apyori-1.1.2.tar.gz (8.6 kB)
  Preparing metadata (setup.py) ... [?25l- done
[?25hBuilding wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l- \ done
[?25h  Created wheel for apyori: filename=apyori-1.1.2-py3-none-any.whl size=5974 sha256=4aa3ca72c308ff30d10e0ba73af77213745bd81400ad9b07f36ac1aab276a798
  Stored in directory: /root/.cache/pip/wheels/cb/f6/e1/57973c631d27efd1a2f375bd6a83b2a616c4021f24aab84080
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2
[0m

**Importing Data into Notebook**


In [2]:
market_data = pd.read_csv("/kaggle/input/marketbasketoptimisationdata/Market_Basket_Optimisation.csv", header=None)

In [3]:
market_data.head

<bound method NDFrame.head of                  0                  1            2                 3   \
0            shrimp            almonds      avocado    vegetables mix   
1           burgers          meatballs         eggs               NaN   
2           chutney                NaN          NaN               NaN   
3            turkey            avocado          NaN               NaN   
4     mineral water               milk   energy bar  whole wheat rice   
...             ...                ...          ...               ...   
7496         butter         light mayo  fresh bread               NaN   
7497        burgers  frozen vegetables         eggs      french fries   
7498        chicken                NaN          NaN               NaN   
7499       escalope          green tea          NaN               NaN   
7500           eggs    frozen smoothie  yogurt cake    low fat yogurt   

                4                 5     6               7             8   \
0     green grape

**Data Preprocessing**


### Currently we have data in the form of a pandas dataframe. 

The Apriori library we are going to use (from apyori import apriori) requires our dataset to be in the form of a list of lists, where the whole dataset is a big list and each transaction in the dataset is an inner list within the outer big list.


### To create this list, we start by initializing an empty list. We then populate this with different transactions in our Pandas data frame

### To achieve this, we shall use a for-loop function that will iterate over the different observations of our pandas dataset and populate our empty list with the elements of such observations.


#### We then take the append function from our created list, which will add different elements from our dataset into our list one by one.

#### However, to succeed in this step, we shall use two for-loops, one to iterate over all the 7501 different transactions and the second one overall the 20 columns, so that the append function adds each element to the list independently.

#### To convert our pandas dataframe into a list of lists, execute the following script:


In [4]:
transactions = []

for i in range(0,7501):
    transactions.append([str(market_data.values[i,j])for j in range(1,20)])
    

# **Training the Apriori model on the dataset**


In [5]:
from apyori import apriori

In [6]:
rules = apriori(transactions=transactions, min_support=0.01, min_confidence=0.3, min_lift=3, min_lenght=2, max_length=3)
rules

<generator object apriori at 0x7fbb5407fdd0>

## Visualizing the results in tabular form


**Displaying the first results coming directly from the output of the apriori function**

*   We actually get a non-tabular result which will be challenging to interpret
*   It is therefore necessary to transform the outputs into a Pandas DF


In [7]:
results = list(rules)
results

[RelationRecord(items=frozenset({'ground beef', 'spaghetti'}), support=0.029062791627782962, ordered_statistics=[OrderedStatistic(items_base=frozenset({'ground beef'}), items_add=frozenset({'spaghetti'}), confidence=0.42003853564547206, lift=3.3095683360049217)]),
 RelationRecord(items=frozenset({'milk', 'soup'}), support=0.012931609118784161, ordered_statistics=[OrderedStatistic(items_base=frozenset({'soup'}), items_add=frozenset({'milk'}), confidence=0.3222591362126246, lift=3.055961796119971)]),
 RelationRecord(items=frozenset({'tomatoes', 'spaghetti'}), support=0.01546460471937075, ordered_statistics=[OrderedStatistic(items_base=frozenset({'tomatoes'}), items_add=frozenset({'spaghetti'}), confidence=0.3853820598006645, lift=3.036502973282336)]),
 RelationRecord(items=frozenset({'ground beef', 'mineral water', 'spaghetti'}), support=0.012531662445007332, ordered_statistics=[OrderedStatistic(items_base=frozenset({'ground beef', 'mineral water'}), items_add=frozenset({'spaghetti'}), c

In [8]:
rules = apriori(transactions=transactions, min_support=0.003, min_confidence=0.3, min_lift=3, min_lenght=2, max_length=3)
results = list(rules)
results

[RelationRecord(items=frozenset({'bacon', 'spaghetti'}), support=0.0030662578322890282, ordered_statistics=[OrderedStatistic(items_base=frozenset({'bacon'}), items_add=frozenset({'spaghetti'}), confidence=0.3898305084745763, lift=3.0715531975502066)]),
 RelationRecord(items=frozenset({'spaghetti', 'grated cheese'}), support=0.005599253432875617, ordered_statistics=[OrderedStatistic(items_base=frozenset({'grated cheese'}), items_add=frozenset({'spaghetti'}), confidence=0.42, lift=3.3092647058823528)]),
 RelationRecord(items=frozenset({'ground beef', 'herb & pepper'}), support=0.005999200106652446, ordered_statistics=[OrderedStatistic(items_base=frozenset({'herb & pepper'}), items_add=frozenset({'ground beef'}), confidence=0.3237410071942446, lift=4.678962032685989)]),
 RelationRecord(items=frozenset({'ground beef', 'spaghetti'}), support=0.029062791627782962, ordered_statistics=[OrderedStatistic(items_base=frozenset({'ground beef'}), items_add=frozenset({'spaghetti'}), confidence=0.4200

## **Putting the results well organised into a Pandas DataFrame**


In [9]:
def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))

resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

**Displaying the results (not sorted in ascending or desceding order of lift)**


In [10]:
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,bacon,spaghetti,0.003066,0.389831,3.071553
1,grated cheese,spaghetti,0.005599,0.420000,3.309265
2,herb & pepper,ground beef,0.005999,0.323741,4.678962
3,ground beef,spaghetti,0.029063,0.420039,3.309568
4,tomato sauce,ground beef,0.003600,0.329268,4.758847
...,...,...,...,...,...
103,pepper,spaghetti,0.007732,0.420290,3.311549
104,red wine,spaghetti,0.005333,0.454545,3.581455
105,tomato sauce,spaghetti,0.005599,0.512195,4.035689
106,tomatoes,spaghetti,0.015465,0.385382,3.036503
