
# 🛍️ Case Study 2: Online Retail Dataset (Market Basket Analysis)

In this case study, we will use the **Online Retail dataset** to demonstrate **Association Rule Mining**.  
The dataset contains transactions from a UK-based online store, including invoices and product descriptions.  

We’ll go through:  
1. Loading and preprocessing data  
2. Transforming into basket (one-hot encoded) format  
3. Applying the Apriori algorithm  
4. Extracting association rules  
5. Interpreting results  


In [None]:

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules


### Step 1: Load the Online Retail dataset

In [None]:

# Note: Replace the path below with the actual dataset location if you have it
# The dataset is available from UCI Machine Learning Repository / Kaggle
# file_path = "OnlineRetail.xlsx"

# For demo purposes, we’ll simulate a small retail dataset in 'long format'
data = {
    'InvoiceNo': [1,1,1,2,2,3,3,3,3],
    'Description': ['Milk','Bread','Butter','Bread','Diapers',
                    'Milk','Diapers','Beer','Cola']
}
df = pd.DataFrame(data)
df


### Step 2: Convert transactions to basket format

In [None]:

basket = df.groupby(['InvoiceNo', 'Description'])['Description'].count().unstack().fillna(0)
basket = basket.applymap(lambda x: 1 if x > 0 else 0)
basket.head()


### Step 3: Find frequent itemsets using Apriori

In [None]:

frequent_itemsets = apriori(basket, min_support=0.3, use_colnames=True)
frequent_itemsets.sort_values(by='support', ascending=False)


### Step 4: Generate Association Rules

In [None]:

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)
rules[['antecedents','consequents','support','confidence','lift']].sort_values(by='lift', ascending=False)


### Step 5: Filter interesting rules

In [None]:

filtered_rules = rules[(rules['confidence'] > 0.6) & (rules['lift'] > 1.2)]
filtered_rules[['antecedents','consequents','support','confidence','lift']]



## ✅ Conclusion
- We transformed retail transactions into **basket format** (invoice × products).  
- Using **Apriori**, we identified frequent itemsets and strong rules.  
- Example: Customers buying **Milk + Bread** are likely to also buy **Butter**.  
- This analysis helps in **product placement, bundling, and recommendation systems** in e-commerce.  
