### Customer Segmentation



### Project Overview: Customer Segmentation Using Clustering

Clustering is a key unsupervised machine learning technique used to discover natural groupings within data, particularly when labels are unavailable. It plays a significant role across various scientific, engineering, and business domains, enabling deeper insights and data-driven decisions.

This project focuses on customer segmentation for a shopping mall, leveraging clustering algorithms to analyze customer behavior and identify patterns. By segmenting customers into meaningful groups, businesses can tailor their marketing strategies, optimize customer engagement, and enhance profitability.

---

### Dataset Features and Descriptions

Below are the features of the dataset, providing insights into customer credit card usage and payment behaviors:

1. **CUST_ID**: Unique identifier for each credit card holder.  
2. **BALANCE**: Monthly average balance, calculated as the daily average balance over the past 12 months.  
3. **BALANCE_FREQUENCY**: Frequency of balance updates over the last 12 months (1 = Frequently updated, 0 = Not frequently updated).  
4. **PURCHASES**: Total purchase amount spent by the customer in the last 12 months.  
5. **ONEOFF_PURCHASES**: Total amount spent on one-off purchases.  
6. **INSTALLMENTS_PURCHASES**: Total amount spent on installment-based purchases.  
7. **CASH_ADVANCE**: Total amount withdrawn through cash advances.  
8. **PURCHASES_FREQUENCY**: Frequency of purchases (1 = Frequent purchases, 0 = Rare purchases).  
9. **ONEOFF_PURCHASES_FREQUENCY**: Frequency of one-off purchases (1 = Frequent, 0 = Rare).  
10. **PURCHASES_INSTALLMENTS_FREQUENCY**: Frequency of installment-based purchases (1 = Frequent, 0 = Rare).  
11. **CASHADVANCE_FREQUENCY**: Frequency of cash advances.  
12. **CASH_ADVANCE_TRX**: Average amount per cash-advance transaction.  
13. **PURCHASES_TRX**: Average amount per purchase transaction.  
14. **CREDIT_LIMIT**: Credit limit assigned to the customer.  
15. **PAYMENTS**: Total payments made by the customer to reduce their statement balance.  
16. **MINIMUM_PAYMENTS**: Total minimum payments due during the period.  
17. **PRC_FULL_PAYMENT**: Percentage of months in which the customer paid their full statement balance.  
18. **TENURE**: Number of months the customer has held their credit card.  

---

### Objective and Value
By applying clustering to this dataset, the project aims to group customers based on their credit card usage patterns. These insights can help businesses:  
- **Personalize Offers**: Tailor marketing strategies to meet specific customer needs.  
- **Enhance Retention**: Identify high-value customers and implement retention strategies.  
- **Optimize Services**: Understand diverse customer behaviors to improve product offerings and financial services.  

This project demonstrates how unsupervised learning algorithms, such as K-means clustering, can transform raw data into actionable business intelligence.



In [4]:
# Data
import pandas as pd
import numpy as np
from scipy import stats

#Collections library for counting elements in a list
#from collections import Counter

#tqdm library for progress bars
#from tqdm import tqdm

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
from plotly.subplots import make_subplots
from termcolor import colored

# Algorithms
from sklearn.cluster import KMeans
from sklearn.cluster import AgglomerativeClustering
from sklearn.mixture import GaussianMixture
from sklearn.cluster import MiniBatchKMeans
from sklearn.cluster import DBSCAN
from itertools import product
from sklearn.neighbors import NearestNeighbors
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, GradientBoostingClassifier


# Train Test Split
from sklearn.model_selection import train_test_split


# scipy library for hierarchical clustering
from scipy.cluster.hierarchy import ward, dendrogram, linkage
from scipy.cluster import hierarchy





# sklearn library
from sklearn import metrics
from sklearn.metrics import accuracy_score
from sklearn.metrics import silhouette_score, calinski_harabasz_score

# Scaling
from sklearn.preprocessing import StandardScaler

# Supress warnings
import warnings
warnings.filterwarnings('ignore')

In [5]:
data = pd.read_csv('Customer_Data.csv')