🛍️ Customer Segmentation — Mall Customers Clustering

🎯 An unsupervised machine learning project that segments mall customers into meaningful groups using K-Means and DBSCAN clustering — helping businesses identify high-value customers and design targeted marketing strategies.

🎯 Problem Statement

Businesses struggle to understand who their customers really are. Without customer segmentation:

❌ Marketing campaigns target everyone = wasted budget
❌ No distinction between premium vs budget customers
❌ Missed opportunity to retain high-value customers

Solution: Use clustering algorithms to automatically group customers by income, spending behavior, age, and gender — no labels needed.

📂 Dataset — Mall Customers (Kaggle)

Feature	Description
`CustomerID`	Unique customer identifier
`Gender`	Male / Female
`Age`	Customer age
`Annual Income (k$)`	Yearly income in thousands
`Spending Score (1-100)`	Mall-assigned score based on behavior

Source: Mall Customers Dataset — Kaggle

🔄 Pipeline

Raw Data (Mall_Customers.csv)
        ↓
Data Preprocessing
├── Handle missing values
├── Encode Gender (Label Encoding)
└── Feature Scaling (StandardScaler)
        ↓
Finding Optimal K
├── Elbow Method (WCSS)
└── Silhouette Score
        ↓
Clustering
├── K-Means — spherical clusters
└── DBSCAN — density-based + outlier detection
        ↓
Dimensionality Reduction
└── PCA (2D projection for visualization)
        ↓
Results
├── Cluster labels saved → mall_customers_with_clusters.csv
└── Business insights per segment

👥 Customer Segments Discovered

Cluster	Profile	Strategy
💎 High Income, High Spending	Premium customers	VIP loyalty programs
🛒 Low Income, High Spending	Impulsive spenders	EMI offers, deals
📉 High Income, Low Spending	Untapped potential	Targeted campaigns
💼 Middle Income, Average	Regular customers	Retention discounts
👴 Older, Conservative	Low engagement	Senior programs

🧠 Algorithms Used

K-Means Clustering

from sklearn.cluster import KMeans

# Elbow method to find optimal K
wcss = []
for k in range(1, 11):
    km = KMeans(n_clusters=k, random_state=42)
    km.fit(X_scaled)
    wcss.append(km.inertia_)

# Final model
kmeans = KMeans(n_clusters=5, random_state=42)
labels = kmeans.fit_predict(X_scaled)

DBSCAN (Bonus)

from sklearn.cluster import DBSCAN

# Density-based — detects non-spherical clusters + outliers
db = DBSCAN(eps=0.5, min_samples=5)
labels = db.fit_predict(X_scaled)
# label = -1 means noise/outlier

PCA for Visualization

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
# Plot 2D clusters from multi-dimensional data

📊 Key Results

Optimal K = 5 clusters (Elbow + Silhouette Score)
K-Means works better for spherical, well-separated clusters
DBSCAN identifies outlier customers automatically
PCA confirms clear cluster separation in 2D space
Results saved to mall_customers_with_clusters.csv

🛠️ Tech Stack

Layer	Technology
Language	Python 3.x
Data Processing	Pandas, NumPy
ML Algorithms	Scikit-learn (K-Means, DBSCAN, PCA)
Visualization	Matplotlib, Seaborn
Notebook	Jupyter Notebook

🚀 Getting Started

# Clone the repo
git clone https://github.com/tashfeen786/CustomerSegmentation.git
cd CustomerSegmentation

# Install dependencies
pip install pandas numpy matplotlib seaborn scikit-learn jupyter

# Run notebook
jupyter notebook Task_02_Mall_Customers_Clustering_Project.ipynb

🏗️ Project Structure

CustomerSegmentation/
│
├── Task_02_Mall_Customers_Clustering_Project.ipynb  # Main notebook
├── Task_02_Mall_Customers_Clustering_Project.pdf    # PDF export
├── Mall_Customers.csv                               # Raw dataset
├── mall_customers_with_clusters.csv                 # Clustered output
└── README.md

🔮 Future Improvements

Hierarchical Clustering — dendrogram visualization
Plotly — interactive 3D cluster plots
Streamlit dashboard — interactive segmentation tool
RFM Analysis — Recency, Frequency, Monetary segmentation
Real e-commerce dataset — more complex features

👨‍💻 Author

Tashfeen Aziz — AI/ML Engineer & Python Developer

⭐ If you found this project helpful, please give it a star!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🛍️ Customer Segmentation — Mall Customers Clustering

🎯 Problem Statement

📂 Dataset — Mall Customers (Kaggle)

🔄 Pipeline

👥 Customer Segments Discovered

🧠 Algorithms Used

K-Means Clustering

DBSCAN (Bonus)

PCA for Visualization

📊 Key Results

🛠️ Tech Stack

🚀 Getting Started

🏗️ Project Structure

🔮 Future Improvements

👨‍💻 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitattributes		.gitattributes
Mall_Customers.csv		Mall_Customers.csv
README.md		README.md
Task_02_Mall_Customers_Clustering_Project.ipynb		Task_02_Mall_Customers_Clustering_Project.ipynb
Task_02_Mall_Customers_Clustering_Project.pdf		Task_02_Mall_Customers_Clustering_Project.pdf
mall_customers_with_clusters.csv		mall_customers_with_clusters.csv

Folders and files

Latest commit

History

Repository files navigation

🛍️ Customer Segmentation — Mall Customers Clustering

🎯 Problem Statement

📂 Dataset — Mall Customers (Kaggle)

🔄 Pipeline

👥 Customer Segments Discovered

🧠 Algorithms Used

K-Means Clustering

DBSCAN (Bonus)

PCA for Visualization

📊 Key Results

🛠️ Tech Stack

🚀 Getting Started

🏗️ Project Structure

🔮 Future Improvements

👨‍💻 Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages