# RFM Analysis (Customer Segmentation)

Segment customers based on purchasing behavior.

## What is RFM?
- **Recency (R):** Days since last purchase.
- **Frequency (F):** Total number of transactions.
- **Monetary (M):** Total money spent.

## Goal
Identify 'Champions', 'At Risk', and 'New' customers.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Load generated data
try:
    df = pd.read_csv('ecommerce_data.csv')
    df['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])
    print("Data Loaded Successfully")
except FileNotFoundError:
    print("Run 01_data_generation.ipynb first!")

## 1. Calculate RFM Metrics

In [None]:
# Set Analysis Date (day after last purchase)
analysis_date = df['InvoiceDate'].max() + pd.Timedelta(days=1)

# Aggregate by Customer
rfm = df.groupby('CustomerID').agg({
    'InvoiceDate': lambda x: (analysis_date - x.max()).days, # Recency
    'InvoiceNo': 'count',                                   # Frequency
    'TotalAmount': 'sum'                                    # Monetary
})

# Rename columns
rfm.columns = ['Recency', 'Frequency', 'Monetary']
print(rfm.head())

## 2. RFM Scores (Quantiles)
Score customers from 1-4 (1 is bad, 4 is good).

In [None]:
# Rate Recency (Lower is better, so labels are reversed)
r_labels = range(4, 0, -1)
f_labels = range(1, 5)
m_labels = range(1, 5)

rfm['R'] = pd.qcut(rfm['Recency'], q=4, labels=r_labels).astype(int)
rfm['F'] = pd.qcut(rfm['Frequency'], q=4, labels=f_labels).astype(int)
rfm['M'] = pd.qcut(rfm['Monetary'], q=4, labels=m_labels).astype(int)

rfm['RFM_Score'] = rfm['R'].map(str) + rfm['F'].map(str) + rfm['M'].map(str)
print(rfm.head())

## 3. Customer Segments
Map scores to segment names.

In [None]:
def segment_customer(row):
    score = int(row['RFM_Score'])
    if score >= 444: return 'Champions'
    if score >= 333: return 'Loyal'
    if score >= 222: return 'Potential Loyalist'
    if score >= 111: return 'At Risk'
    return 'Needs Attention'

rfm['Segment'] = rfm.apply(segment_customer, axis=1)

# Count segments
segment_counts = rfm['Segment'].value_counts()
print(segment_counts)

## 4. Visualization

In [None]:
plt.figure(figsize=(10, 6))
sns.barplot(x=segment_counts.index, y=segment_counts.values, palette='viridis')
plt.title('Customer Segments Distribution')
plt.ylabel('Number of Customers')
plt.show()

In [None]:
# Scatter Plot: Recency vs Monetary
plt.figure(figsize=(10, 6))
sns.scatterplot(
    data=rfm, 
    x='Recency', 
    y='Monetary', 
    hue='Segment', 
    palette='deep'
)
plt.title('Recency vs Monetary by Segment')
plt.show()

## Key Insights

1. **Champions:** Buy often, spend high, bought recently.
2. **At Risk:** High value but haven't bought in a while.
3. **Action:** Send 'We miss you' emails to 'At Risk' group.