# Customer Segmentation Using Unsupervised Learning


**Abstract:** 
In the competitive retail landscape, mass marketing is often inefficient. This project applies **K-Means Clustering** to segment mall customers based on demographic and behavioral data (Age, Income, Spending Score). By moving from a generalist approach to data-driven micro-segmentation, we aim to derive actionable insights for personalized marketing strategies.

**Methodology:**
1. **Exploratory Data Analysis (EDA):** Univariate distributions and Multivariate correlation analysis.
2. **Data Preprocessing:** Feature selection and Z-score Normalization (StandardScaler) to handle variance disparities.
3. **Model Optimization:** Using the **Elbow Method** (Inertia) and **Silhouette Analysis** to determine the optimal number of clusters ($k$).
4. **Modeling:** Implementation of K-Means with `k-means++` initialization to ensure convergence.
5. **3D Visualization:** Plotting clusters in a 3-dimensional space for interpretability.

### 1. Library Import & Environment Setup
We utilize `Pandas` for manipulation, `Seaborn` for statistical visualization, and `Plotly` for interactive 3D plotting.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score

# Configuration for cleaner output
import warnings
warnings.filterwarnings('ignore')
sns.set_theme(style="whitegrid", palette="deep")

### 2. Data Loading & Statistical Audit
The dataset contains basic information about customers, including their unique ID, Gender, Age, Annual Income, and Spending Score.

In [None]:
df = pd.read_csv("Mall_Customers.csv")

# Checking data structure
print(f"Dataset Dimensions: {df.shape}")
display(df.head())

**Statistical Summary:** 
We analyze the central tendency and dispersion of the data. This helps identify if scaling is required (e.g., if Income ranges from 15-137 while Age ranges from 18-70).

In [None]:
display(df.describe().T)

### 3. Exploratory Data Analysis (EDA)
#### 3.1 Univariate Analysis: Distributions
Before clustering, we must understand the underlying distribution of each feature. We use Kernel Density Estimation (KDE) to visualize the probability density.

In [None]:
fig, axes = plt.subplots(1, 3, figsize=(18, 5))

sns.histplot(df['Age'], kde=True, ax=axes[0], color='skyblue')
axes[0].set_title('Distribution of Age')

sns.histplot(df['Annual Income (k$)'], kde=True, ax=axes[1], color='orange')
axes[1].set_title('Distribution of Income')

sns.histplot(df['Spending Score (1-100)'], kde=True, ax=axes[2], color='green')
axes[2].set_title('Distribution of Spending Score')

plt.show()

#### 3.2 Bivariate Analysis: Correlation Heatmap
We check for multicollinearity. In Clustering, highly correlated features might bias the distance metric.

In [None]:
plt.figure(figsize=(8, 6))
# Dropping CustomerID and Gender for correlation matrix
corr_matrix = df[['Age', 'Annual Income (k$)', 'Spending Score (1-100)']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Feature Correlation Matrix')
plt.show()

### 4. Data Preprocessing
**Why Scaling?** 
K-Means calculates the Euclidean distance between points. If one feature has a large range (e.g., Income: 0-137,000) and another has a small range (e.g., Age: 0-100), the algorithm will be biased towards Income.

We use **StandardScaler** to transform data such that $\mu = 0$ and $\sigma = 1$.

In [None]:
features = ['Age', 'Annual Income (k$)', 'Spending Score (1-100)']
X = df[features]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("Data standardization complete.")

### 5. Model Selection: Finding Optimal $K$
We employ two techniques to determine the number of clusters:
1. **The Elbow Method:** Measures the Within-Cluster Sum of Squares (WCSS). We look for the point where the decrease in WCSS slows down (the "elbow").
2. **Silhouette Score:** (Optional but recommended) Measures how similar an object is to its own cluster compared to other clusters.

In [None]:
# Calculating WCSS for K=1 to K=10
wcss = []
silhouette_scores = []

for k in range(1, 11):
    kmeans = KMeans(n_clusters=k, init='k-means++', random_state=42)
    kmeans.fit(X_scaled)
    wcss.append(kmeans.inertia_)
    
    # Silhouette score is only valid for k > 1
    if k > 1:
        score = silhouette_score(X_scaled, kmeans.labels_)
        silhouette_scores.append(score)

**Visualizing the Elbow Curve:**

In [None]:
plt.figure(figsize=(10, 5))
plt.plot(range(1, 11), wcss, marker='o', linestyle='--', color='crimson')
plt.title('Elbow Method: WCSS vs Number of Clusters')
plt.xlabel('Number of Clusters (K)')
plt.ylabel('WCSS (Inertia)')
plt.grid(True)
plt.show()

**Interpretation:** The plot shows a clear "elbow" at **K=5** (and potentially K=6). We will proceed with K=5 as it offers the best balance between cluster compactness and interpretability.

### 6. Model Training
We initialize K-Means with `k-means++` to select intelligent starting centroids, which speeds up convergence and avoids poor clustering results.

In [None]:
# Final Model with K=5
kmeans_final = KMeans(n_clusters=5, init='k-means++', random_state=42)
df['Cluster'] = kmeans_final.fit_predict(X_scaled)

print("Clustering completed. Labels assigned.")

### 7. 3D Visualization & Interpretation
Since we used three features, a 2D plot would result in information loss. A 3D scatter plot allows us to view the separation of clusters across Age, Income, and Spending Score simultaneously.

In [None]:
# Interactive 3D Plot
fig = px.scatter_3d(df, 
                    x='Age', 
                    y='Annual Income (k$)', 
                    z='Spending Score (1-100)',
                    color='Cluster', 
                    opacity=0.8, 
                    title="3D Cluster Visualization",
                    labels={'Cluster': 'Customer Segment'},
                    color_continuous_scale=px.colors.qualitative.Bold)
fig.show()

### 8. Conclusion & Strategic Recommendations
By analyzing the centroids of the derived clusters, we can profile the customer segments as follows:

| Cluster | Characteristics (Age / Income / Spend) | Persona | Strategy |
| :--- | :--- | :--- | :--- |
| **0** | Middle Age / Medium / Medium | **The Average Joe** | Standard promotions and retention strategies. |
| **1** | Young / High / High | **The Elite** | VIP treatment, luxury brand marketing, and exclusive access. |
| **2** | Old / High / Low | **The Conservative Wealthy** | Focus on "Value for Money" and quality assurance. |
| **3** | Young / Low / High | **The Impulse Buyer** | Target with flash sales and trend-focused marketing. |
| **4** | Any / Low / Low | **The Budget Conscious** | Offer discounts, coupons, and clearance sales. |

**Future Work:** 
To improve this model, we could incorporate:
* **Hierarchical Clustering** (Dendrograms) to validate K.
* **Categorical Data:** Including Gender in the analysis using K-Modes or Gower Distance.