Skip to content

tashfeen786/CustomerSegmentation

Repository files navigation

๐Ÿ›๏ธ Customer Segmentation โ€” Mall Customers Clustering

Python Jupyter Scikit-Learn Algorithm Domain License

๐ŸŽฏ An unsupervised machine learning project that segments mall customers into meaningful groups using K-Means and DBSCAN clustering โ€” helping businesses identify high-value customers and design targeted marketing strategies.


๐ŸŽฏ Problem Statement

Businesses struggle to understand who their customers really are. Without customer segmentation:

  • โŒ Marketing campaigns target everyone = wasted budget
  • โŒ No distinction between premium vs budget customers
  • โŒ Missed opportunity to retain high-value customers

Solution: Use clustering algorithms to automatically group customers by income, spending behavior, age, and gender โ€” no labels needed.


๐Ÿ“‚ Dataset โ€” Mall Customers (Kaggle)

Feature Description
CustomerID Unique customer identifier
Gender Male / Female
Age Customer age
Annual Income (k$) Yearly income in thousands
Spending Score (1-100) Mall-assigned score based on behavior

Source: Mall Customers Dataset โ€” Kaggle


๐Ÿ”„ Pipeline

Raw Data (Mall_Customers.csv)
        โ†“
Data Preprocessing
โ”œโ”€โ”€ Handle missing values
โ”œโ”€โ”€ Encode Gender (Label Encoding)
โ””โ”€โ”€ Feature Scaling (StandardScaler)
        โ†“
Finding Optimal K
โ”œโ”€โ”€ Elbow Method (WCSS)
โ””โ”€โ”€ Silhouette Score
        โ†“
Clustering
โ”œโ”€โ”€ K-Means โ€” spherical clusters
โ””โ”€โ”€ DBSCAN โ€” density-based + outlier detection
        โ†“
Dimensionality Reduction
โ””โ”€โ”€ PCA (2D projection for visualization)
        โ†“
Results
โ”œโ”€โ”€ Cluster labels saved โ†’ mall_customers_with_clusters.csv
โ””โ”€โ”€ Business insights per segment

๐Ÿ‘ฅ Customer Segments Discovered

Cluster Profile Strategy
๐Ÿ’Ž High Income, High Spending Premium customers VIP loyalty programs
๐Ÿ›’ Low Income, High Spending Impulsive spenders EMI offers, deals
๐Ÿ“‰ High Income, Low Spending Untapped potential Targeted campaigns
๐Ÿ’ผ Middle Income, Average Regular customers Retention discounts
๐Ÿ‘ด Older, Conservative Low engagement Senior programs

๐Ÿง  Algorithms Used

K-Means Clustering

from sklearn.cluster import KMeans

# Elbow method to find optimal K
wcss = []
for k in range(1, 11):
    km = KMeans(n_clusters=k, random_state=42)
    km.fit(X_scaled)
    wcss.append(km.inertia_)

# Final model
kmeans = KMeans(n_clusters=5, random_state=42)
labels = kmeans.fit_predict(X_scaled)

DBSCAN (Bonus)

from sklearn.cluster import DBSCAN

# Density-based โ€” detects non-spherical clusters + outliers
db = DBSCAN(eps=0.5, min_samples=5)
labels = db.fit_predict(X_scaled)
# label = -1 means noise/outlier

PCA for Visualization

from sklearn.decomposition import PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)
# Plot 2D clusters from multi-dimensional data

๐Ÿ“Š Key Results

  • Optimal K = 5 clusters (Elbow + Silhouette Score)
  • K-Means works better for spherical, well-separated clusters
  • DBSCAN identifies outlier customers automatically
  • PCA confirms clear cluster separation in 2D space
  • Results saved to mall_customers_with_clusters.csv

๐Ÿ› ๏ธ Tech Stack

Layer Technology
Language Python 3.x
Data Processing Pandas, NumPy
ML Algorithms Scikit-learn (K-Means, DBSCAN, PCA)
Visualization Matplotlib, Seaborn
Notebook Jupyter Notebook

๐Ÿš€ Getting Started

# Clone the repo
git clone https://github.com/tashfeen786/CustomerSegmentation.git
cd CustomerSegmentation

# Install dependencies
pip install pandas numpy matplotlib seaborn scikit-learn jupyter

# Run notebook
jupyter notebook Task_02_Mall_Customers_Clustering_Project.ipynb

๐Ÿ—๏ธ Project Structure

CustomerSegmentation/
โ”‚
โ”œโ”€โ”€ Task_02_Mall_Customers_Clustering_Project.ipynb  # Main notebook
โ”œโ”€โ”€ Task_02_Mall_Customers_Clustering_Project.pdf    # PDF export
โ”œโ”€โ”€ Mall_Customers.csv                               # Raw dataset
โ”œโ”€โ”€ mall_customers_with_clusters.csv                 # Clustered output
โ””โ”€โ”€ README.md

๐Ÿ”ฎ Future Improvements

  • Hierarchical Clustering โ€” dendrogram visualization
  • Plotly โ€” interactive 3D cluster plots
  • Streamlit dashboard โ€” interactive segmentation tool
  • RFM Analysis โ€” Recency, Frequency, Monetary segmentation
  • Real e-commerce dataset โ€” more complex features

๐Ÿ‘จโ€๐Ÿ’ป Author

Tashfeen Aziz โ€” AI/ML Engineer & Python Developer

LinkedIn GitHub Email


โญ If you found this project helpful, please give it a star!

About

๐Ÿ›๏ธ Mall customer segmentation using K-Means & DBSCAN clustering + PCA | Unsupervised ML ยท Retail Analytics ยท Python ยท Scikit-Learn

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors