# Project 3 â€” Customer Segmentation (Clustering)

**Goal:** Segment customers into meaningful groups based on behavioral characteristics using K-Means clustering.

This type of segmentation helps businesses identify high-value customer groups, target marketing strategies, and personalize communication.

**Dataset Example:** RFM customer dataset or `customers.csv`.


In [None]:
import pandas as pd, numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

sns.set(style='whitegrid')

In [None]:
df = pd.read_csv('customers.csv')
print('Shape:', df.shape)
df.head()

In [None]:
# Select RFM features if present, otherwise first few numeric columns
features = []
for col in ['Recency','Frequency','Monetary']:
    if col in df.columns:
        features.append(col)

if not features:
    num = df.select_dtypes(include=['int64','float64'])
    features = num.columns.tolist()[:6]

X = df[features].dropna()
print('Features used:', features)

In [None]:
scaler = StandardScaler()
Xs = scaler.fit_transform(X)

In [None]:
sse = []
K = range(2,11)
for k in K:
    km = KMeans(n_clusters=k, random_state=42)
    km.fit(Xs)
    sse.append(km.inertia_)

plt.plot(K, sse, '-o')
plt.xlabel('Number of clusters (k)')
plt.ylabel('SSE (Inertia)')
plt.title('Elbow Method for K Selection')
plt.show()

In [None]:
k = 4
km = KMeans(n_clusters=k, random_state=42)
labels = km.fit_predict(Xs)
print('Silhouette Score:', silhouette_score(Xs, labels))

df_clustered = X.copy()
df_clustered['cluster'] = labels
df_clustered.groupby('cluster').mean()

In [None]:
pca = PCA(n_components=2)
components = pca.fit_transform(Xs)

plt.figure(figsize=(7,5))
plt.scatter(components[:,0], components[:,1], c=labels, cmap='tab10', alpha=0.7)
plt.title('Customer Segments Visualized (PCA)')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.show()

### Insights & Business Impact
- Cluster characteristics can show high-value and low-value customer groups.
- Helps in targeted promotions and retention strategies.
- Can be extended using CRM datasets.

**Next steps:**
- Try hierarchical clustering or DBSCAN.
- Add demographic segmentation.
- Build marketing strategies for each cluster.
