# Project 02: Market Basket Analysis in Python using Apriori Algorithm
<img src='image/img_market_basket.png' height="5%" width="50%">

### Submitted By: Yashuv Baskota
### Language: Python

### Dataset: https://www.kaggle.com/datasets/heeraldedhia/groceries-dataset

#### What is Market Basket Analysis?
*Market basket analysis* is a technique for identifying relationships between items in large datasets of customer transactions. It is a valuable tool for data mining and machine learning, and is widely used in retail and e-commerce to identify which items are frequently purchased together in a transaction, with the goal of improving sales and customer loyalty.

## 1.Importing Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

import warnings
warnings.filterwarnings("ignore")

## 2.Reading the Dataset

In [None]:
data = pd.read_csv('data/Groceries_dataset.csv')

In [None]:
data

## 3.EDA

In [None]:
# Get the top 20 most frequently occurring values
x = data['itemDescription'].value_counts().sort_values(ascending=False)[:20]

In [None]:
plt.figure(figsize=(10,6))
sns.barplot(x=x.index, y=x.values)
plt.xticks(rotation=45)
plt.show()

## 4.Apriori Algorithm
Market basket analysis is typically implemented using association rule learning, which is a technique for discovering interesting relationships between variables in a dataset. One of the most popular algorithms for association rule learning is the Apriori algorithm, which is used to identify frequent item sets and generate association rules. These rules are evaluated using measures such as support, confidence, and lift, which are used to determine the strength of the association between items.

For example, consider a dataset of customer purchases from a grocery store. An association rule might be "If a customer buys bread, they are likely to also buy butter."

The `Evaluation metric` used to evaluate the strength of the association between items are: 

* **Support**: It is a measure of the frequency of an itemset in the dataset. Mathematically, it is calculated as follows:
$$support(I) = \frac{number\ of\ transactions\ containing\ itemset\ I}{total\ number\ of\ transactions}$$

* **Confidence**: It tells us how often the items a and b occur given that a is bought. It is a measure of the reliability of an association rule. Mathematically, it is calculated as follows:
$$confidence(X \rightarrow Y) = \frac{support(X \cup Y)}{support(X)}$$

* **Lift**: It is a measure of the strength of an association rule. Mathematically, it is calculated as follows:
$$lift(X \rightarrow Y) = \frac{confidence(X \rightarrow Y)}{support(Y)}$$

where X and Y are the *antecedent* and *consequent* of the rule, respectively.

In [None]:
data['Quantity'] = 1

In [None]:
data

In [None]:
transactions =  data.groupby(['Member_number','itemDescription'])['Quantity'].sum().unstack().reset_index().set_index('Member_number')

In [None]:
transactions = transactions.fillna(0)

In [None]:
transactions

In [None]:
def encode(x):
    if x <=0:
        return 0
    elif x>=0:
        return 1

In [None]:
basket = transactions.applymap(encode)
basket

In [None]:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

In [None]:
frequent_itemset = apriori(basket, min_support=0.06, use_colnames=True)
rules = association_rules(frequent_itemset, metric='lift', min_threshold=1)

In [None]:
rules.head()

In [None]:
confidence_threshold = 0.4
lift_threshold = 1
rules[(rules['confidence'] > confidence_threshold) & (rules['lift'] > lift_threshold)]

## 5. Make Recommendation on Items

In [None]:
# recommend items based on single or multiple input item
def recommend_items(items, confidence_threshold=0.4, lift_threshold=1):
    if isinstance(items, str):
        # Select the rules that contain the input item as the antecedent and have a confidence greater than the threshold
        recommendations = rules.loc[(rules['antecedents'] == {items}) & (rules['confidence'] > confidence_threshold) & (rules['lift'] > lift_threshold)]
    if isinstance(items, list):
        recommendations = rules.loc[(rules['antecedents'].apply(lambda x: any(item in x for item in items))) & (rules['confidence'] > confidence_threshold)]
    # Extract the consequents of the selected rules as the recommended items
    recommended_items = recommendations['consequents'].apply(lambda x: list(x)).tolist()
    # Flatten the list of recommended items
    recommended_items = [item for sublist in recommended_items for item in sublist if item not in items]
    return list(set(recommended_items))

In [None]:
# recommend items for single i/p item
recommend_items("yogurt")

In [None]:
# recommend items for multiple i/p items
recommend_items(["rolls/buns", "yogurt"])

In [None]:
recommend_items("canned beer")