# Customer Segmentation Strategy for Retail Growth

**Project Objective:**
To identify distinct customer segments within mall data using unsupervised machine learning (K-Means Clustering). This analysis aims to enable targeted marketing strategies by grouping customers based on demographics and spending behavior.

**Tech Stack:**
* **Language:** Python
* **Data Handling:** Pandas, NumPy
* **Visualization:** Matplotlib, Seaborn, Plotly (3D)
* **ML Model:** K-Means Clustering
* **Preprocessing:** Scikit-Learn (StandardScaler)

In [None]:
# 1. Environment Setup
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px  # For interactive 3D plots
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

# Set aesthetic visualization style
sns.set_theme(style="whitegrid", palette="muted")
import warnings
warnings.filterwarnings('ignore') # Clean up output for presentation

In [None]:
# 2. Data Loading & Initial Audit
try:
    df = pd.read_csv("Mall_Customers.csv")
    print("Dataset Loaded Successfully.")
    print(f"Data Shape: {df.shape}")
    
    # Display first few rows and statistical summary
    display(df.head())
    display(df.describe())
except FileNotFoundError:
    print("Error: 'Mall_Customers.csv' not found. Please ensure the dataset is in the directory.")

In [None]:
# 3. Exploratory Data Analysis (EDA)
# Analyzing the distribution of key metrics to understand customer demographics
plt.figure(figsize=(16, 5))

plt.subplot(1, 3, 1)
sns.histplot(df['Age'], kde=True, color='royalblue')
plt.title('Age Distribution')

plt.subplot(1, 3, 2)
sns.histplot(df['Annual Income (k$)'], kde=True, color='seagreen')
plt.title('Annual Income Distribution')

plt.subplot(1, 3, 3)
sns.histplot(df['Spending Score (1-100)'], kde=True, color='crimson')
plt.title('Spending Score Distribution')

plt.tight_layout()
plt.show()

In [None]:
# 4. Data Preprocessing
# We select the relevant features for clustering: Age, Income, and Spending Score.
features = ['Age', 'Annual Income (k$)', 'Spending Score (1-100)']
X = df[features]

# Scaling the features
# K-Means uses distance calculations, so we must scale variables to have a mean of 0 and variance of 1.
# This prevents high-value columns (like Income) from dominating the algorithm.
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("Data scaled successfully.")

In [None]:
# 5. The Elbow Method
# Determining the optimal number of clusters (K) by minimizing Within-Cluster Sum of Squares (WCSS).

wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', random_state=42)
    kmeans.fit(X_scaled)
    wcss.append(kmeans.inertia_)

plt.figure(figsize=(10, 5))
plt.plot(range(1, 11), wcss, marker='o', linestyle='--', color='crimson')
plt.title('Elbow Method Analysis')
plt.xlabel('Number of clusters (K)')
plt.ylabel('WCSS')
plt.grid(True)
plt.show()

# Insight: The curve bends sharply at K=5, indicating 5 is the optimal number of clusters.

In [None]:
# 6. Model Training & 3D Visualization
# Training the K-Means model with 5 clusters
kmeans_final = KMeans(n_clusters=5, init='k-means++', random_state=42)
df['Cluster'] = kmeans_final.fit_predict(X_scaled)

# Interactive 3D Scatter Plot using Plotly
# This visualizes the separation between customer groups across all three dimensions.
fig = px.scatter_3d(df, x='Age', y='Annual Income (k$)', z='Spending Score (1-100)',
              color='Cluster', opacity=0.8, 
              title="3D Visualization of Customer Segments",
              labels={'Cluster': 'Segment ID'},
              template='plotly_white')
fig.show()

### 7. Strategic Business Recommendations
Based on the clusters identified above, we can define the following customer profiles and strategies:

| Segment ID | Profile Name | Characteristics | Marketing Strategy |
| :--- | :--- | :--- | :--- |
| **0** | **Steady Consumers** | Mid age, average income, average spend. | Keep them engaged with loyalty programs. |
| **1** | **Target Customers** | Young/Mid age, high income, high spend. | Exclusive offers, luxury brand invites, personal shoppers. |
| **2** | **Sensible Spenders** | Higher age, high income, low spend. | Value-based offers, retirement/lifestyle planning focus. |
| **3** | **Carefree Spenders** | Young, low income, high spend. | Trend-based social media ads, fast-fashion alerts. |
| **4** | **Budget Conscious** | Any age, low income, low spend. | Flash sales, discount coupons, bulk buy offers. |