<a href="https://colab.research.google.com/github/zetawolfx2/Machine-learning-repo/blob/master/Association_Rule_Learning/Apriori.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Apriori

## Importing the libraries

In [None]:
!pip install apyori

Collecting apyori
  Downloading https://files.pythonhosted.org/packages/5e/62/5ffde5c473ea4b033490617ec5caa80d59804875ad3c3c57c0976533a21a/apyori-1.1.2.tar.gz
Building wheels for collected packages: apyori
  Building wheel for apyori (setup.py) ... [?25l[?25hdone
  Created wheel for apyori: filename=apyori-1.1.2-cp36-none-any.whl size=5975 sha256=2f70ee6eef5e595a06dedb5b912b95ee5d6760039dd55a4126715d0d547e56cf
  Stored in directory: /root/.cache/pip/wheels/5d/92/bb/474bbadbc8c0062b9eb168f69982a0443263f8ab1711a8cad0
Successfully built apyori
Installing collected packages: apyori
Successfully installed apyori-1.1.2


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Data Preprocessing

In [None]:
dataset = pd.read_csv('Market_Basket_Optimisation.csv', header = None)
transactions = []
for i in range(0, 7501):
    transactions.append([str(dataset.values[i,j]) for j in range(0,20)])

#Apriori works on strings so we typecast the dataset.values of ith row as we go through the j columns and append that to transactions.
#Also, we've made a list inside a list because 1-> [a,b,c,d... etc.] and thats one list then 2nd -> [e,f,g] items and so on.

## Training the Apriori model on the dataset

In [8]:
from apyori import apriori
rules = apriori(transactions = transactions, min_support = 0.003, min_confidence = 0.2, min_lift = 3, min_length = 2, max_length = 2)

#support was calculated as 3 times the product was brought everyday for 7 days over 7501 orders => 3*7/7501 approximated to 0.003

#min_length and max_length -> one product on left and one product on right --> To find the best deals of 2 products (buy1 get1 kinda)

"""
for buy2get1 and so on -> min and max = 3
for flexibility of multi buyxget1 starting from buy1get1 to buy10get1 -> min = 2 and max = 11
"""

## Visualising the results

### Displaying the first results coming directly from the output of the apriori function

In [9]:
results = list(rules)
results

"""
We can see that in 1st example 
items_base is light cream 
And items_add is chicken 
so if people buy light cream 
they have a high probability of buying chicken
"""

[RelationRecord(items=frozenset({'chicken', 'light cream'}), support=0.004532728969470737, ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),
 RelationRecord(items=frozenset({'mushroom cream sauce', 'escalope'}), support=0.005732568990801226, ordered_statistics=[OrderedStatistic(items_base=frozenset({'mushroom cream sauce'}), items_add=frozenset({'escalope'}), confidence=0.3006993006993007, lift=3.790832696715049)]),
 RelationRecord(items=frozenset({'pasta', 'escalope'}), support=0.005865884548726837, ordered_statistics=[OrderedStatistic(items_base=frozenset({'pasta'}), items_add=frozenset({'escalope'}), confidence=0.3728813559322034, lift=4.700811850163794)]),
 RelationRecord(items=frozenset({'honey', 'fromage blanc'}), support=0.003332888948140248, ordered_statistics=[OrderedStatistic(items_base=frozenset({'fromage blanc'}), items_add=frozenset({'honey'}), confidence=0

### Putting the results well organised into a Pandas DataFrame

In [11]:
"""
An example of whats written down there
Lets take the very first entry in the results

For each result there are 3 indexes
index 0 -> RelationRecord(items=frozenset({'chicken', 'light cream'}), 
index 1 -> support=0.004532728969470737, 
index 2 -> ordered_statistics=[OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), confidence=0.29059829059829057, lift=4.84395061728395)]),

Now in 2nd index, for 0th index (It has only one index and that one index has 3 indexes and the 1st index in that has 2 indexes)
ordered_statistics=
index 0 -> [OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'}), 
index 1 -> confidence=0.29059829059829057, 
index 2 -> lift=4.84395061728395.

For LHS and RHS, we access the 0th and 1st entry of OrderedStatistic(items_base=frozenset({'light cream'}), items_add=frozenset({'chicken'})
and in that again we have to access the 0th entry. Eg: items_base=frozenset({'light cream'}), light cream is at index 0
"""

def inspect(results):
    lhs         = [tuple(result[2][0][0])[0] for result in results]
    rhs         = [tuple(result[2][0][1])[0] for result in results]
    supports    = [result[1] for result in results]
    confidences = [result[2][0][2] for result in results]
    lifts       = [result[2][0][3] for result in results]
    return list(zip(lhs, rhs, supports, confidences, lifts))
#this zip functions matches the 1st entry of all the above with one another in a tuple. Then we put that in a list for flexibility
resultsinDataFrame = pd.DataFrame(inspect(results), columns = ['Left Hand Side', 'Right Hand Side', 'Support', 'Confidence', 'Lift'])

### Displaying the results non sorted

In [12]:
resultsinDataFrame

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
0,light cream,chicken,0.004533,0.290598,4.843951
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
2,pasta,escalope,0.005866,0.372881,4.700812
3,fromage blanc,honey,0.003333,0.245098,5.164271
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
6,light cream,olive oil,0.0032,0.205128,3.11471
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
8,pasta,shrimp,0.005066,0.322034,4.506672


### Displaying the results sorted by descending lifts

In [13]:
resultsinDataFrame.nlargest(n = 10, columns = 'Lift')

Unnamed: 0,Left Hand Side,Right Hand Side,Support,Confidence,Lift
3,fromage blanc,honey,0.003333,0.245098,5.164271
0,light cream,chicken,0.004533,0.290598,4.843951
2,pasta,escalope,0.005866,0.372881,4.700812
8,pasta,shrimp,0.005066,0.322034,4.506672
7,whole wheat pasta,olive oil,0.007999,0.271493,4.12241
5,tomato sauce,ground beef,0.005333,0.377358,3.840659
1,mushroom cream sauce,escalope,0.005733,0.300699,3.790833
4,herb & pepper,ground beef,0.015998,0.32345,3.291994
6,light cream,olive oil,0.0032,0.205128,3.11471


Confidence signifies the probability that X then Y.

Lift signifies the probability that X then Y while controlling how popular Y is among the Support.

Lets take an example using the 1st entry
Fromage Blanc was brought by 0.3 percent of the people i.e 22 people -> Support

People who had Honey and had Fromage Blanc was 24.5% -> Confidence

Now people who have been recommended Honey for having Fromage Blanc will see a Lift of 5.164%

Lift is the improvement in the prediction compared to randomly suggesting stuff.