# 🛍️ E-Commerce Customer Segmentation using RFM and K-Means Clustering
In this notebook, we analyze customer data from an online retail dataset to segment customers using RFM analysis and K-Means clustering. The goal is to derive insights to inform marketing and inventory strategies.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from datetime import datetime

## 📥 Load the Dataset

In [None]:
# Replace with your file path or use Kaggle API to load
# df = pd.read_csv('OnlineRetail.csv')
df = pd.read_excel('Online Retail.xlsx')
df.head()

## 🧹 Data Cleaning

In [None]:
df.dropna(subset=['CustomerID'], inplace=True)
df = df[~df['InvoiceNo'].astype(str).str.startswith('C')]
df['TotalPrice'] = df['Quantity'] * df['UnitPrice']
df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])
df.head()

## 📊 RFM Analysis

In [None]:
snapshot_date = df['InvoiceDate'].max() + pd.Timedelta(days=1)
rfm = df.groupby('CustomerID').agg({
    'InvoiceDate': lambda x: (snapshot_date - x.max()).days,
    'InvoiceNo': 'nunique',
    'TotalPrice': 'sum'
})
rfm.rename(columns={
    'InvoiceDate': 'Recency',
    'InvoiceNo': 'Frequency',
    'TotalPrice': 'Monetary'
}, inplace=True)
rfm.head()

## 📈 K-Means Clustering

In [None]:
scaler = StandardScaler()
rfm_scaled = scaler.fit_transform(rfm)

# Determine optimal number of clusters (Elbow Method)
inertia = []
for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, random_state=42)
    kmeans.fit(rfm_scaled)
    inertia.append(kmeans.inertia_)

plt.figure(figsize=(8,4))
plt.plot(range(1, 11), inertia, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')
plt.show()

In [None]:
kmeans = KMeans(n_clusters=4, random_state=42)
rfm['Cluster'] = kmeans.fit_predict(rfm_scaled)
rfm.head()

## 📉 Cluster Visualization

In [None]:
sns.boxplot(x='Cluster', y='Recency', data=rfm)
plt.title('Recency by Cluster')
plt.show()

sns.boxplot(x='Cluster', y='Frequency', data=rfm)
plt.title('Frequency by Cluster')
plt.show()

sns.boxplot(x='Cluster', y='Monetary', data=rfm)
plt.title('Monetary by Cluster')
plt.show()

## 🔍 Insights & Recommendations
- **Cluster 0:** Likely VIPs (low Recency, high Frequency and Monetary)
- **Cluster 1:** At-risk customers (high Recency, low Frequency and Monetary)
- **Cluster 2:** Potential loyalists (medium across the board)
- **Cluster 3:** One-time buyers (low Frequency)

### 📢 Marketing Strategy:
- Retarget at-risk customers with discounts or emails.
- Offer loyalty programs for potential loyalists.
- Optimize inventory for products bought by VIPs.