Customer segmentation is a crucial task in data analytics, as it helps businesses target different customer groups more effectively. Here are some of the key techniques for customer segmentation:

### 1. **Demographic Segmentation**
   - **What it is**: Grouping customers based on attributes like age, gender, income, education, occupation, and marital status.
   - **Use Case**: A retail company might segment customers into age groups to design targeted marketing strategies for each group.

### 2. **Geographic Segmentation**
   - **What it is**: Dividing customers based on geographic regions, such as city, state, country, or even neighborhood.
   - **Use Case**: A business can target customers in specific regions with localized promotions or products.

### 3. **Behavioral Segmentation**
   - **What it is**: Segmenting customers based on behaviors, such as purchase history, product usage, brand loyalty, and engagement patterns.
   - **Use Case**: E-commerce companies can create segments based on browsing history, abandoned carts, or frequent purchases.

### 4. **Psychographic Segmentation**
   - **What it is**: Grouping customers based on psychological traits, lifestyles, values, interests, and opinions.
   - **Use Case**: A luxury brand might segment customers based on values like exclusivity and social status.

### 5. **RFM (Recency, Frequency, Monetary) Analysis**---->here experimented
   - **What it is**: RFM is a method used to segment customers based on their recent transactions, frequency of purchases, and the total monetary value spent.
   - **Use Case**: A company can identify its most loyal and valuable customers and create strategies to retain them.

### 6. **Cluster Analysis (Unsupervised Learning)**
   - **What it is**: Using algorithms like K-means, DBSCAN, or hierarchical clustering to group customers based on similar attributes.
   - **Use Case**: A telecommunications company can cluster customers based on their usage patterns (e.g., high data users vs. voice-only users).

### 7. **Decision Trees**
   - **What it is**: A supervised learning method used to classify customers based on certain decision criteria.
   - **Use Case**: A financial institution might segment customers based on their likelihood to default on loans.

### 8. **K-means Clustering**
   - **What it is**: An algorithm that divides data points into 'K' groups by minimizing the distance between points in the same cluster.
   - **Use Case**: E-commerce sites can use K-means to segment customers based on purchasing patterns.

### 9. **Latent Class Analysis**
   - **What it is**: A method that identifies subgroups (or classes) of customers based on their behaviors or preferences.
   - **Use Case**: A streaming service might segment users into those who prefer specific types of content (e.g., movies vs. TV series).

### 10. **Customer Lifetime Value (CLV) Segmentation**
   - **What it is**: Grouping customers based on their expected lifetime value, calculated using past interactions.
   - **Use Case**: Retailers can allocate marketing resources to high-value customers while reducing efforts on lower-value segments.

### 11. **Churn Prediction Models**
   - **What it is**: Predicting which customer segments are more likely to stop using a product or service based on historical data.
   - **Use Case**: A subscription-based business can target at-risk customers with special offers to retain them.

### 12. **Market Basket Analysis**
   - **What it is**: A technique to analyze the relationships between products purchased together.
   - **Use Case**: A grocery store might segment customers based on their purchase of complementary products (e.g., bread and butter).

In [1]:
!pip install plotly



In [2]:
!pip install nbformat --upgrade



In [3]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
#import datetime as dt
from datetime import datetime as dt, timedelta
import plotly.express as px
import plotly.graph_objects as go
import plotly.colors

In [92]:
# Read the data
data = pd.read_csv('/content/online_retail.csv')
data.head()

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom


In [93]:
data.dropna(subset=['CustomerID'], inplace=True)

data['InvoiceDate'] = pd.to_datetime(data['InvoiceDate'])
data['TotalAmount'] = data['Quantity'] * data['UnitPrice']

In [94]:
data

Unnamed: 0,InvoiceNo,StockCode,Description,Quantity,InvoiceDate,UnitPrice,CustomerID,Country,TotalAmount
0,536365,85123A,WHITE HANGING HEART T-LIGHT HOLDER,6,2010-12-01 08:26:00,2.55,17850.0,United Kingdom,15.30
1,536365,71053,WHITE METAL LANTERN,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,20.34
2,536365,84406B,CREAM CUPID HEARTS COAT HANGER,8,2010-12-01 08:26:00,2.75,17850.0,United Kingdom,22.00
3,536365,84029G,KNITTED UNION FLAG HOT WATER BOTTLE,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,20.34
4,536365,84029E,RED WOOLLY HOTTIE WHITE HEART.,6,2010-12-01 08:26:00,3.39,17850.0,United Kingdom,20.34
...,...,...,...,...,...,...,...,...,...
541904,581587,22613,PACK OF 20 SPACEBOY NAPKINS,12,2011-12-09 12:50:00,0.85,12680.0,France,10.20
541905,581587,22899,CHILDREN'S APRON DOLLY GIRL,6,2011-12-09 12:50:00,2.10,12680.0,France,12.60
541906,581587,23254,CHILDRENS CUTLERY DOLLY GIRL,4,2011-12-09 12:50:00,4.15,12680.0,France,16.60
541907,581587,23255,CHILDRENS CUTLERY CIRCUS PARADE,4,2011-12-09 12:50:00,4.15,12680.0,France,16.60


In [95]:
reference_date = pd.Timestamp(dt.now().date())
reference_date

Timestamp('2024-10-12 00:00:00')

In [96]:
reference_date = data['InvoiceDate'].max() + timedelta(days=1)
reference_date

Timestamp('2011-12-10 12:50:00')

In [97]:
rfm = data.groupby('CustomerID').agg({
    'InvoiceDate': lambda x: (reference_date - x.max()).days,
    'InvoiceNo': 'count',
    'TotalAmount': 'sum'
})

rfm.rename(columns={'InvoiceDate': 'Recency', 'InvoiceNo': 'Frequency', 'TotalAmount': 'Value'}, inplace=True)

In [98]:
rfm.head()

Unnamed: 0_level_0,Recency,Frequency,Value
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
12346.0,326,2,0.0
12347.0,2,182,4310.0
12348.0,75,31,1797.24
12349.0,19,73,1757.55
12350.0,310,17,334.4


#below cell for debugging specific customer id details

In [99]:
# Get the specific group for a given CustomerID
customer_id = 12433.0
selected_group = rfm.loc[customer_id]

# Display the selected group (row details)
print(selected_group)


# data = pd.read_csv('/content/online_retail.csv')
# # Select the row for a specific CustomerID
# customer_id = 12433.0
# customer_id = 12395.0
# selected_row = data[data['CustomerID'] == customer_id]

# # Display the selected row
# print(selected_row)

# # Select the row for a specific CustomerID
# customer_id = 12395.0

# # Use .loc with index value instead of label.
# selected_row = data.loc[rfe.index == customer_id]
# # This condition finds rows in `rfe` where index equals `customer_id`.

# # Display the selected row if found.
# if not selected_row.empty:
#     print(selected_row)
# else:
#     print(f"Customer ID {customer_id} not found in the DataFrame.")

Recency          1.00
Frequency      420.00
Value        13375.87
Name: 12433.0, dtype: float64


#below recency score increases for recent visit customers
#frequency score increases as the frequency value increases

In [100]:
# Define quantiles
quantiles = rfm.quantile(q=[0.25, 0.5, 0.75])

# Assign RFM scores
def RScore(x, p, d):
    if p == 'Recency':
        if x <= d[p][0.25]:
            return 4
        elif x <= d[p][0.50]:
            return 3
        elif x <= d[p][0.75]:
            return 2
        else:
            return 1
    else:
        if x <= d[p][0.25]:
            return 1
        elif x <= d[p][0.50]:
            return 2
        elif x <= d[p][0.75]:
            return 3
        else:
            return 4

#Important note:

For 12347 customer id, score is high as frequency increases and recently visited, and value increases

In [101]:
rfm['R'] = rfm['Recency'].apply(RScore, args=('Recency', quantiles,))
rfm['F'] = rfm['Frequency'].apply(RScore, args=('Frequency', quantiles,))
rfm['M'] = rfm['Value'].apply(RScore, args=('Value', quantiles,))

rfm.head()

Unnamed: 0_level_0,Recency,Frequency,Value,R,F,M
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
12346.0,326,2,0.0,1,1,1
12347.0,2,182,4310.0,4,4,4
12348.0,75,31,1797.24,2,2,4
12349.0,19,73,1757.55,3,3,4
12350.0,310,17,334.4,1,1,2


In [102]:
rfm['RFM_Segment'] = rfm['R'].astype(str) + rfm['F'].astype(str) + rfm['M'].astype(str)
rfm['RFM_Score'] = rfm[['R', 'F', 'M']].sum(axis=1)

rfm.head()

Unnamed: 0_level_0,Recency,Frequency,Value,R,F,M,RFM_Segment,RFM_Score
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
12346.0,326,2,0.0,1,1,1,111,3
12347.0,2,182,4310.0,4,4,4,444,12
12348.0,75,31,1797.24,2,2,4,224,8
12349.0,19,73,1757.55,3,3,4,334,10
12350.0,310,17,334.4,1,1,2,112,4


In [103]:
segment_labels = ['Low-Value', 'Mid-Value', 'High-Value']

def assign_segment(score):
    if score < 5:
        return "Low-Value"
    elif score < 9:
        return "Mid-Value"
    else:
        return "High-Value"

rfm['RFM_Segment_Label'] = rfm['RFM_Score'].apply(assign_segment)

rfm.head()

Unnamed: 0_level_0,Recency,Frequency,Value,R,F,M,RFM_Segment,RFM_Score,RFM_Segment_Label
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
12346.0,326,2,0.0,1,1,1,111,3,Low-Value
12347.0,2,182,4310.0,4,4,4,444,12,High-Value
12348.0,75,31,1797.24,2,2,4,224,8,Mid-Value
12349.0,19,73,1757.55,3,3,4,334,10,High-Value
12350.0,310,17,334.4,1,1,2,112,4,Low-Value


In [104]:
segment_counts = rfm['RFM_Segment_Label'].value_counts().reset_index()
segment_counts.columns = ['RFM Segment', 'Count']
segment_counts = segment_counts.sort_values('RFM Segment')

# Create the bar chart using Plotly
fig = px.bar(segment_counts,
             x='RFM Segment',
             y='Count',
             title='Customer Distribution by RFM Segment',
             labels={'RFM Segment': 'RFM Segment', 'Count': 'Number of Customers'},
             color='RFM Segment',
             color_discrete_sequence=px.colors.qualitative.Pastel)

fig.show()

In [105]:
rfm['RFM_Customer_Segments'] = ""

rfm.loc[rfm['RFM_Score'] > 9, 'RFM_Customer_Segments'] = "VIP/Loyal"
rfm.loc[(rfm['RFM_Score'] > 6) & (rfm['RFM_Score'] <= 9), 'RFM_Customer_Segments'] = "Potential Loyal"
rfm.loc[(rfm['RFM_Score'] > 5) & (rfm['RFM_Score'] <= 6), 'RFM_Customer_Segments'] = "At Risk Customers"
rfm.loc[(rfm['RFM_Score'] > 3) & (rfm['RFM_Score'] <= 4), 'RFM_Customer_Segments'] = "Can't Lose"
rfm.loc[rfm['RFM_Score'] <= 3, 'RFM_Customer_Segments'] = "Lost"

segment_counts = rfm['RFM_Customer_Segments'].value_counts().sort_index()


In [106]:
segment_product_counts = rfm.groupby(['RFM_Segment_Label', 'RFM_Customer_Segments']).size().reset_index(name='Count')
segment_product_counts = segment_product_counts.sort_values('Count', ascending=False)


# Create the treemap
fig_treemap_segment_product = px.treemap(segment_product_counts,
                                         path=['RFM_Segment_Label', 'RFM_Customer_Segments'],
                                         values='Count',
                                         color='RFM_Segment_Label',
                                         color_discrete_sequence=px.colors.qualitative.Pastel,
                                         title='RFM Customer Segments by Value')

# Display the treemap
fig_treemap_segment_product.show()

In [107]:
vip_segment = rfm[rfm['RFM_Customer_Segments'] == 'VIP/Loyal']

fig = go.Figure()
fig.add_trace(go.Box(y=vip_segment['Recency'], name="Recency"))
fig.add_trace(go.Box(y=vip_segment['Frequency'], name='Frequency'))
fig.add_trace(go.Box(y=vip_segment['Value'], name='Value'))

fig.show()

In [108]:
correlation_matrix = vip_segment[['R', 'F', 'M']].corr()
correlation_matrix

Unnamed: 0,R,F,M
R,1.0,-0.224357,-0.169527
F,-0.224357,1.0,0.229672
M,-0.169527,0.229672,1.0


In [109]:
fig_heatmap = go.Figure(data=go.Heatmap(
    z=correlation_matrix.values,
    x=correlation_matrix.columns,
    y=correlation_matrix.columns,
    colorscale='RdBu',
    colorbar=dict(title='Correlation')
))

fig_heatmap.update_layout(title='Correlation Matrix of RFM Values within Champions Segment')


# Display the heatmap
fig_heatmap.show()

In [110]:
pastel_colors = plotly.colors.qualitative.Pastel

fig = go.Figure(data=[go.Bar(x=segment_counts.index, y=segment_counts.values, marker=dict(color=pastel_colors))])

vip_color = 'rgb(158, 202, 225)'

fig.update_traces(marker_color=[vip_color if segment == 'Champions' else pastel_colors[i] for i, segment in enumerate(segment_counts.index)],
                  marker_line_color='rgb(8, 48, 107)',
                  marker_line_width=1.5, opacity=0.6)


# Update the layout
fig.update_layout(title='Comparison of RFM Segments',
                  xaxis_title='RFM Segments',
                  yaxis_title='Number of Customers',
                  showlegend=False)


# Display the figure
fig.show()

In [111]:
segment_scores = rfm.groupby('RFM_Customer_Segments')[['R', 'F', 'M']].mean().reset_index()

fig = go.Figure()

# Add bars for Recency score
fig.add_trace(go.Bar(
    x=segment_scores['RFM_Customer_Segments'],
    y=segment_scores['R'],
    name="Recency Score",
    marker_color='rgb(158,202,225)'
))

# Add bars for Frequency score
fig.add_trace(go.Bar(
    x=segment_scores['RFM_Customer_Segments'],
    y=segment_scores['F'],

    name="Frequency Score",
    marker_color='rgb(94,158,217)'
))

# Add bars for Monetary score
fig.add_trace(go.Bar(
    x=segment_scores['RFM_Customer_Segments'],
    y=segment_scores['M'],

    name="Monetary Score",
    marker_color='rgb(32,102,148)'
))


# Update the layout
fig.update_layout(
    title='Comparison of RFM Segments based on Recency, Frequency, and Monetary Scores',
    xaxis_title='RFM Segments',
    yaxis_title='Score',
    barmode='group',
    showlegend=True
)

# Display the figure
fig.show()