## ASSOCIATION RULES


### Objective

The Objective of this assignment is to introduce students to rule mining techniques, particularly focusing on market basket analysis and provide hands on experience.



In [1]:
#Import Libraries
import numpy as np
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
from mlxtend.frequent_patterns import association_rules,apriori
from mlxtend.preprocessing import TransactionEncoder

In [2]:
df = pd.read_excel("Online retail.xlsx")

In [3]:
df

Unnamed: 0,"shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil"
0,"burgers,meatballs,eggs"
1,chutney
2,"turkey,avocado"
3,"mineral water,milk,energy bar,whole wheat rice..."
4,low fat yogurt
...,...
7495,"butter,light mayo,fresh bread"
7496,"burgers,frozen vegetables,eggs,french fries,ma..."
7497,chicken
7498,"escalope,green tea"


In [6]:
#Data Preprocessing:
df.dtypes

shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil    object
dtype: object

In [7]:
# Display the first few rows of the dataset
print(df.head())

  shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil
0                             burgers,meatballs,eggs                                                                                                                                                                             
1                                            chutney                                                                                                                                                                             
2                                     turkey,avocado                                                                                                                                                                             
3  mineral water,milk,energy bar,whole wheat rice...                                            

In [8]:
# Display the column names and data types
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7500 entries, 0 to 7499
Data columns (total 1 columns):
 #   Column                                                                                                                                                                                                                           Non-Null Count  Dtype 
---  ------                                                                                                                                                                                                                           --------------  ----- 
 0   shrimp,almonds,avocado,vegetables mix,green grapes,whole weat flour,yams,cottage cheese,energy drink,tomato juice,low fat yogurt,green tea,honey,salad,mineral water,salmon,antioxydant juice,frozen smoothie,spinach,olive oil  7500 non-null   object
dtypes: object(1)
memory usage: 58.7+ KB
None


In [9]:
# Data Preprocessing
# Split the items in each transaction into a list
df['Transaction'] = df.iloc[:, 0].apply(lambda x: x.split(','))

In [14]:
# Apply TransactionEncoder to transform the list of items into a one-hot encoded DataFrame
te = TransactionEncoder()
te_ary = te.fit_transform(df['Transaction'])
df_encoded = pd.DataFrame(te_ary,columns = te.columns_)

In [15]:
# Implement Apriori Algorithm
frequent_itemsets = apriori(df_encoded, min_support=0.001, use_colnames=True)


In [16]:
# Generate the rules
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)

In [17]:
# Sort rules by lift
rules = rules.sort_values(by='lift', ascending=False)

In [19]:
rules.head(10)

Unnamed: 0,antecedents,consequents,antecedent support,consequent support,support,confidence,lift,representativity,leverage,conviction,zhangs_metric,jaccard,certainty,kulczynski
33927,"(pasta, french fries)","(escalope, mushroom cream sauce)",0.003067,0.005733,0.001067,0.347826,60.667341,1.0,0.001049,1.524542,0.986542,0.137931,0.344065,0.266936
33930,"(escalope, mushroom cream sauce)","(pasta, french fries)",0.005733,0.003067,0.001067,0.186047,60.667341,1.0,0.001049,1.224804,0.989188,0.137931,0.183543,0.266936
33772,"(mineral water, pasta)","(shrimp, eggs)",0.002133,0.014133,0.001333,0.625,44.221698,1.0,0.001303,2.628978,0.979476,0.089286,0.619624,0.35967
33777,"(shrimp, eggs)","(mineral water, pasta)",0.014133,0.002133,0.001333,0.09434,44.221698,1.0,0.001303,1.101811,0.991398,0.089286,0.092403,0.35967
33926,"(pasta, escalope)","(french fries, mushroom cream sauce)",0.005867,0.004667,0.001067,0.181818,38.961039,1.0,0.001039,1.216519,0.980083,0.112676,0.177982,0.205195
33931,"(french fries, mushroom cream sauce)","(pasta, escalope)",0.004667,0.005867,0.001067,0.228571,38.961039,1.0,0.001039,1.288691,0.978902,0.112676,0.224019,0.205195
33935,(mushroom cream sauce),"(pasta, escalope, french fries)",0.019067,0.0016,0.001067,0.055944,34.965035,1.0,0.001036,1.057564,0.990281,0.054422,0.054431,0.361305
33922,"(pasta, escalope, french fries)",(mushroom cream sauce),0.0016,0.019067,0.001067,0.666667,34.965035,1.0,0.001036,2.9428,0.972957,0.054422,0.660188,0.361305
33932,(pasta),"(escalope, french fries, mushroom cream sauce)",0.015733,0.002,0.001067,0.067797,33.898305,1.0,0.001035,1.070582,0.986013,0.064,0.065928,0.300565
33925,"(escalope, french fries, mushroom cream sauce)",(pasta),0.002,0.015733,0.001067,0.533333,33.898305,1.0,0.001035,2.109143,0.972445,0.064,0.525874,0.300565


#### Interview Questions:

1.What is lift and why is it important in Association rules? Lift measures the strength of an association rule by comparing the observed co-occurrence of items to their expected co-occurrence if they were independent. It is important because a lift value greater than 1 indicates a positive association, while a value less than 1 indicates a negative association.

2.What is support and confidence? How do you calculate them? Support: This measure gives an idea of how frequent an itemset is in all the transactions. Support(X,Y)= Transactions containing both X and Y / Total number of transactions Confidence: This measure defines the likeliness of occurrence of consequent in the cart given that the cart already has the antecedents. Confidence(X,Y)= Transactions containing both X and Y / Transactions containing X



3.What are some limitations or challenges of Association rule mining? Association rule mining can generate an overwhelming number of rules, many of which may be trivial or redundant. It can struggle with handling large datasets efficiently, leading to high computational costs. Additionally, it may not capture complex relationships beyond simple co-occurrences.

#### Conclusion


The Apriori algorithm successfully identified frequent itemsets and generated association rules. Key findings include:



1.**Strong Associations:** Rules with high lift values indicate strong relationships between items.

2.**Useful Metrics:** Support measures frequency, while confidence measures likelihood of the consequent given the antecedent.

3.**Challenges:** The algorithm can produce many rules, which may be overwhelming and computationally intensive.


Recommendations include focusing on actionable rules and optimizing for large datasets.