# Aprori

%pip install apyori` is a command used in Jupyter Notebook to install the ` apyori ` package. The ` apyori ` package is a Python library for performing association rule mining.

In [16]:
%pip install apyori

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


# Data Preprocessing

These lines of code are importing three Python libraries: NumPy, Matplotlib, and Pandas.

In [17]:
import numpy as ns
import matplotlib.pyplot as plt
import pandas as pd

 # Importing Dataset

This code is reading a CSV file named 'digbasket.csv' and storing it in a pandas DataFrame called `dataset`. The `header=None` argument specifies that the CSV file does not have a header row.

In [18]:
dataset = pd.read_csv('digbasket.csv', header = None)
transactions=[]

for i in range(0,7219):
  transactions.append([str(dataset.values[i, j]) for j in range(0, 20)])

In [19]:
print(dataset)

                 0                    1         2             3   \
0           chutney                  NaN       NaN           NaN   
1              knor  ginger garlic paste  MTR Idli           NaN   
2            turkey          spirit fish  tomatoes     spaghetti   
3              eggs                  NaN       NaN           NaN   
4            kinley                  NaN       NaN           NaN   
...             ...                  ...       ...           ...   
7214  mineral water            green tea       NaN           NaN   
7215      green tea                  NaN       NaN           NaN   
7216  patanjali tea              chicken      eggs  french fries   
7217  mineral water                 milk      cake      brownies   
7218       neckrest         french fries       NaN           NaN   

                 4              5       6     7        8   \
0               NaN            NaN     NaN   NaN      NaN   
1               NaN            NaN     NaN   NaN      NaN   


# Apriori Training on Dataset

This code is using the Apriori algorithm from the apyori library to generate association rules from a set of transactions. The `transactions` parameter specifies the input data, `min_support` sets the minimum support threshold for an itemset to be considered frequent, `min_confidence` sets the minimum confidence threshold for a rule to be considered strong, `min_lift` sets the minimum lift threshold for a rule to be considered interesting, and `min_len` sets the minimum length of the itemsets to be considered in the analysis. The output of this code is stored in the `basket_intel` variable, which contains the generated association rules.

In [20]:
from apyori import apriori
basket_intel= apriori(transactions= transactions, min_support=0.002, min_confidence=0.2, min_lift=3, min_len=2)

# Visualizing

In [None]:
results= list(basket_intel)
results

In [10]:
print(results)

[RelationRecord(items=frozenset({'5 star', 'pancakes'}), support=0.0022163734589278295, ordered_statistics=[OrderedStatistic(items_base=frozenset({'5 star'}), items_add=frozenset({'pancakes'}), confidence=0.326530612244898, lift=3.446234634204559)]), RelationRecord(items=frozenset({'parle g', 'MTR Idli'}), support=0.00207785011774484, ordered_statistics=[OrderedStatistic(items_base=frozenset({'MTR Idli'}), items_add=frozenset({'parle g'}), confidence=0.2631578947368421, lift=3.247413405308142)]), RelationRecord(items=frozenset({'almonds', 'burgers'}), support=0.005402410306136584, ordered_statistics=[OrderedStatistic(items_base=frozenset({'almonds'}), items_add=frozenset({'burgers'}), confidence=0.26530612244897955, lift=3.0594966421073218)]), RelationRecord(items=frozenset({'turkey', 'barbecue sauce'}), support=0.002493420141293808, ordered_statistics=[OrderedStatistic(items_base=frozenset({'barbecue sauce'}), items_add=frozenset({'turkey'}), confidence=0.22499999999999998, lift=3.585

The code defines a function called `inspect` that takes in a list of results and extracts information about the products, support, confidence, and lift from each result. It then returns a list of tuples containing this information for each result.

In [22]:
def inspect(results):
  product1     =[tuple(result[2][0][0])[0]for result in results]
  product2     =[tuple(result[2][0][1])[0]for result in results]
  supports   =[result[1] for result in results]
  confidence =[result[2][0][2] for result in results]
  lifts      =[result[2][0][3] for result in results]
  return list(zip(product1, product2, supports, confidence, lifts))
DataFrame_intel= pd.DataFrame(inspect(results), columns=['Product1','Product2','Support','Confidence','lift'])
 

In [23]:
DataFrame_intel

Unnamed: 0,Product1,Product2,Support,Confidence,lift
0,5 star,pancakes,0.002216,0.326531,3.446235
1,MTR Idli,parle g,0.002078,0.263158,3.247413
2,almonds,burgers,0.005402,0.265306,3.059497
3,barbecue sauce,turkey,0.002493,0.225000,3.585596
4,buns,paneer,0.016069,0.324022,3.285277
...,...,...,...,...,...
487,spaghetti,,0.003048,0.333333,3.492501
488,paneer,,0.002216,0.355556,3.725335
489,tomatoes,spaghetti,0.002216,0.307692,10.888386
490,paneer,mineral water,0.002078,0.326087,9.019240


`DataFrame_intel.nlargest(n=10, columns='lift')` is a method call on a pandas DataFrame object named `DataFrame_intel`. It returns the top `n` rows with the largest values in the `lift` column. In this case, it returns the top 10 rows with the largest values in the `lift` column.

In [24]:
DataFrame_intel.nlargest(n=10, columns='lift')

Unnamed: 0,Product1,Product2,Support,Confidence,lift
145,pasta,pepper spray,0.002493,0.461538,23.970116
373,pasta,,0.002493,0.461538,23.970116
391,tomatoes,spaghetti,0.002216,0.307692,10.888386
489,tomatoes,spaghetti,0.002216,0.307692,10.888386
420,paneer,mineral water,0.002078,0.326087,9.01924
490,paneer,mineral water,0.002078,0.326087,9.01924
328,paneer,spaghetti,0.002493,0.219512,7.767934
461,paneer,spaghetti,0.002493,0.219512,7.767934
220,chicken,spaghetti,0.002078,0.277778,7.653732
407,chicken,spaghetti,0.002078,0.277778,7.653732
