In [1]:
import pandas as pd

In [2]:
# Making use of the synthetic data generated
final_df = pd.read_csv("../../data/processed/banking_behaviour_preference.csv")

# Customer Segmentation Model

## 1 Feature Engineering Rules

#### 1.1 Digital Capability
We will combine the following features to create the `Digital_Capability` score:
- **PhoneService**: 
  - If the customer has phone service (`PhoneService = 1`), add 1 point. 
  - Else, do not award any point.
- **InternetService**: 
  - If the customer has **Fiber optic internet**(`InternetService = 1`) or **DSL** (`InternetService = 0`), add 1 points.
  - If **NO** (`InternetService = 2`), do not award any point.
- **TechSupport**: 
  - If the customer uses tech support (`TechSupport = 2`), add 1 points.
  - If no tech support (`TechSupport = 0` or `TechSupport = 1`), add 0 points.
- **PaperlessBilling**: 
  - If the customer has paperless billing (`PaperlessBilling = 1`), add 1 point
  - Else, do not add any point.
- **PaymentMethod**: 
  - If the customer uses **automatic payments** (`PaymentMethod = 0 or 1`), add 2 points.
  - If **electronic check** (`PaymentMethod = 2`), add 1 point.
  - If **mailed check** (`PaymentMethod = 3`), do not add any point.

**Total score range**: 0 to 6  
- **Digital Capability**: If digital capability score > 2, return True (indicating digitally capable). Otherwise, return False.

---

#### 1.3 Financial Status

We will combine the following features to create the `Financial_Status` score using **percentiles**, except for **Income_Category**, which will use strict rules:

- **Income_Category** (strict rule):
  - Assign points based on income category:
    - If the income is `120+`, add **3 points**.
    - If the income is between `80 - 120`, add **2 points**.
    - If the income is between `60 - 80`, add **1 point**.
    - If income is `Less than 40`, add **0 points**.

- **Credit Score**:
  - Assign points based on the credit score percentile:
    - If the credit score is in the top 20th percentile, add **3 points**.
    - If the credit score is between the 20th and 50th percentile, add **2 points**.
    - If the credit score is between the 50th and 80th percentile, add **1 point**.
    - If below the 80th percentile, add **0 points**.

- **Outstanding Loans**:
  - Assign points based on the loan amount percentile:
    - If the loan amount is in the bottom 20th percentile (e.g., less than $10,000), add **3 points**.
    - If the loan amount is between the 20th and 50th percentile, add **2 points**.
    - If the loan amount is between the 50th and 80th percentile, add **1 point**.
    - If above the 80th percentile, add **0 points**.

- **Balance**:
  - Assign points based on the balance percentile:
    - If the balance is in the top 20th percentile, add **3 points**.
    - If the balance is between the 20th and 50th percentile, add **2 points**.
    - If the balance is between the 50th and 80th percentile, add **1 point**.
    - If below the 80th percentile, add **0 points**.

**Total score range**: 0 to 12 (higher means stronger financial status).

---

#### 1.3 Transaction Behavior
We will create a composite score for `Transaction_Behavior`:
- **Total_Trans_Amt**: 
  - If the total transaction amount is in the top 20%, add **3 points**.
  - If the total transaction amount is in the 20th and 50th percentile, add **2 points**.
  - If the total transaction amount is in the 50th and 80th percentile, add **1 point**.
- **Total_Trans_Count**: 
  - If the total transaction count is in the top 20%, add **3 points**.
  - If the total transaction count is in the 20th and 50th percentile, add **2 points**.
  - If the total transaction count is in the 50th and 80th percentile, add **1 point**.

**Total score range**: 0 to 6 (higher means frequent and high-value transactions).

---

#### 1.4 Product Usage
We will categorize customers based on the number of products they use and assign them a `Product_Usage` label:
- **Heavy User**: Customers using more than 4 products, **award 3 points**
- **Moderate User**: Customers using 3-4 products, **award 2 points**
- **Light User**: Customers using 1-2 or fewer products, **award 1 point**

---

#### 1.5 Loyalty Score
The `Loyalty` score will be a combination of the **Transaction Behavior** and **Product Usage** scores.

**Total Loyalty score range**: 0 to 9

## 2. Feature Integration: Banking Behavior and Customer Preferences

#### 2.1. **Loyalty**
Combines `Transaction_Behavious` and `Product_Usage` to measure how engaged the customer is with the bank.
Value ranges from 0-9
#### 2.2. **Financial Status**
Measures the customer’s financial health based on `income`, `credit score`, `outstanding loans`, and `balance`.
Value ranges from 0-12
#### 2.3. **Digital capability**
Captures if the customer is good at techonologies.
  Score based on `PhoneService`, `InternetService`, `TechSupport`, `PaperlessBilling`, and `PaymentMethod`.

## 3. Rule-Based Segmentation Based on Loyalty and Financial status

#### Classification Ranges Based on Percentiles:

##### 3.1. **Loyalty**:
We will categorize customers into **Low**, **Moderate**, and **High** loyalty based on the composite `Loyalty_Score`:
- **Low (L)**: Loyalty score in the **bottom 20%** of the population.
- **Moderate (M)**: Loyalty score between the **20th and 80th percentile**.
- **High (H)**: Loyalty score in the **top 20%** of the population.

##### 3.2. **Financial Status**:
We will categorize customers into **Low**, **Moderate**, and **High** financial status based on the `Financial_Status` score:
- **Low (L)**: Financial status score in the **bottom 20%** of the population.
- **Moderate (M)**: Financial status score between the **20th and 80th percentile**.
- **High (H)**: Financial status score in the **top 20%** of the population.

#### 3.3 **Segments**:

**This part will also talk the possible representative of each cluster and what is their need**

1. **Low Financial status, Low Loyalty**:
   - **Financial status**: Low 
   - **Loyalty**: Low 
   - **Business Insight**: Customers in this segment likely include students or younger individuals just starting out, or elderly customers who aren't financially active. A focus on retention strategies and promoting entry-level products (like low-cost savings accounts) is key. Physical outreach methods may be necessary, especially if they lack digital capability.

2. **High Financial status, High Loyalty**:
   - **Financial status**: High 
   - **Loyalty**: High 
   - **Business Insight**: These are premium customers, likely professionals or established individuals with a strong engagement in banking products. Offering personalized premium services, loyalty rewards, and investment products should be prioritized. Reach out via digital channels for convenience, or through dedicated relationship managers.

3. **High Financial status, Low or Moderate Loyalty**:
   - **Financial status**: High
   - **Loyalty**: Low or Moderate
   - **Business Insight**: These financially capable customers may not be fully engaged with the bank’s products. They could be busy professionals or high-income earners who focus on other banks. Increase engagement by promoting exclusive offers, premium credit cards, or mortgage products. Encourage digital engagement to make banking easier and more convenient for them.

4. **Low or Moderate Financial status, High Loyalty**:
   - **Financial status**: Low or Moderate
   - **Loyalty**: High
   - **Business Insight**: These customers, while financially modest, are loyal users of the bank’s products. Likely individuals in middle-income brackets or those rebuilding credit, they should be offered value-driven products such as cashback credit cards or debt consolidation services. Educating them on budgeting tools via digital channels may improve their financial health.

5. **Moderate Financial status, Moderate Loyalty** + 
   **Low Financial status, Moderate Loyalty** + 
   **Moderate Financial status, Low Loyalty**:
      - **Financial status**: Moderate or Low
      - **Loyalty**: Moderate or Low
      - **Business Insight**: This segment represents customers with no strong financial activity or engagement. They could be occasional users, young professionals, or middle-income families. Focus on financial education, product bundling, or targeted campaigns that address their potential needs, such as home loans or long-term savings plans. If digitally capable, promote app-based interactions; otherwise, rely on physical channels.
  
---

**Digital Capability**:
- A binary indicator indicating the digital capability.
- Customers who are digitally capable should be prioritized for online banking services, mobile app usage, and digital communication.
- Non-digitally capable customers may prefer physical channels, like branch visits or mailed offers, so consider traditional methods of outreach.

In [3]:
class CustomerSegmentation:
    def __init__(self, df):
        self.df = df.copy()
        self.percentiles = {}

    def calculate_initial_percentiles(self):
        # Calculate percentiles for features before Loyalty and Financial_Status are created
        self.percentiles['Credit_Score'] = self.df['Credit Score'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Outstanding_Loans'] = self.df['Outstanding Loans'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Balance'] = self.df['Balance'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Total_Trans_Amt'] = self.df['Total_Trans_Amt'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Total_Trans_Count'] = self.df['Total_Trans_Count'].quantile([0.2, 0.5, 0.8])

    def calculate_final_percentiles(self):
        # After Loyalty and Financial_Status have been created, calculate their percentiles
        self.percentiles['Loyalty'] = self.df['Loyalty'].quantile([0.2, 0.8])
        self.percentiles['Financial_Status'] = self.df['Financial_Status'].quantile([0.2, 0.8])

    def digital_capability(self, row):
        score = 0
        score += row['PhoneService']
        score += 1 if row['InternetService'] in [0, 1] else 0
        score += 1 if row['TechSupport'] == 2 else 0
        score += row['PaperlessBilling']
        score += 2 if row['PaymentMethod'] in [0, 1] else 1 if row['PaymentMethod'] == 2 else 0
        return score > 2

    def financial_status(self, row):
        score = 0
        # Income Category (strict rules)
        if row['Income_Category'] == '120 +':
            score += 3
        elif row['Income_Category'] == '80 - 120':
            score += 2
        elif row['Income_Category'] == '60 - 80':
            score += 1

        # Credit Score (percentile-based)
        if row['Credit Score'] > self.percentiles['Credit_Score'][0.8]:
            score += 3
        elif row['Credit Score'] > self.percentiles['Credit_Score'][0.5]:
            score += 2
        elif row['Credit Score'] > self.percentiles['Credit_Score'][0.2]:
            score += 1

        # Outstanding Loans (percentile-based)
        if row['Outstanding Loans'] < self.percentiles['Outstanding_Loans'][0.2]:
            score += 3
        elif row['Outstanding Loans'] < self.percentiles['Outstanding_Loans'][0.5]:
            score += 2
        elif row['Outstanding Loans'] < self.percentiles['Outstanding_Loans'][0.8]:
            score += 1

        # Balance (percentile-based)
        if row['Balance'] > self.percentiles['Balance'][0.8]:
            score += 3
        elif row['Balance'] > self.percentiles['Balance'][0.5]:
            score += 2
        elif row['Balance'] > self.percentiles['Balance'][0.2]:
            score += 1

        return score

    def transaction_behavior(self, row):
        score = 0
        if row['Total_Trans_Amt'] > self.percentiles['Total_Trans_Amt'][0.8]:
            score += 3
        elif row['Total_Trans_Amt'] > self.percentiles['Total_Trans_Amt'][0.5]:
            score += 2
        elif row['Total_Trans_Amt'] > self.percentiles['Total_Trans_Amt'][0.2]:
            score += 1

        if row['Total_Trans_Count'] > self.percentiles['Total_Trans_Count'][0.8]:
            score += 3
        elif row['Total_Trans_Count'] > self.percentiles['Total_Trans_Count'][0.5]:
            score += 2
        elif row['Total_Trans_Count'] > self.percentiles['Total_Trans_Count'][0.2]:
            score += 1

        return score

    def product_usage(self, row):
        if row['No_of_product'] > 4:
            return 3
        elif 3 <= row['No_of_product'] <= 4:
            return 2
        return 1

    def loyalty_score(self, row):
        return self.transaction_behavior(row) + self.product_usage(row)

    def assign_loyalty_level(self, loyalty_score):
        if loyalty_score > self.percentiles['Loyalty'][0.8]:
            return 'High'
        elif loyalty_score > self.percentiles['Loyalty'][0.2]:
            return 'Moderate'
        else:
            return 'Low'

    def assign_financial_status_level(self, financial_status_score):
        if financial_status_score > self.percentiles['Financial_Status'][0.8]:
            return 'High'
        elif financial_status_score > self.percentiles['Financial_Status'][0.2]:
            return 'Moderate'
        else:
            return 'Low'

    def assign_segment(self, row):
        # Assign loyalty and financial status levels
        loyalty_label = self.assign_loyalty_level(row['Loyalty'])
        financial_status_label = self.assign_financial_status_level(row['Financial_Status'])

        # Return segment based on the classification of loyalty and financial status
        if financial_status_label == 'Low' and loyalty_label == 'Low':
            return 'Low Financial status, Low Loyalty'
        elif financial_status_label == 'High' and loyalty_label == 'High':
            return 'High Financial status, High Loyalty'
        elif financial_status_label == 'High' and loyalty_label in ['Moderate', 'Low']:
            return 'High Financial status, Low or Moderate Loyalty'
        elif financial_status_label in ['Moderate', 'Low'] and loyalty_label == 'High':
            return 'Low or Moderate Financial status, High Loyalty'
        else:
            return 'Moderate or Low Financial status, Moderate or Low Loyalty'

    def perform_segmentation(self):
        # Calculate percentiles for features before Loyalty and Financial Status
        self.calculate_initial_percentiles()

        self.df['Digital_Capability'] = self.df.apply(self.digital_capability, axis=1)
        self.df['Financial_Status'] = self.df.apply(self.financial_status, axis=1)
        self.df['Loyalty'] = self.df.apply(self.loyalty_score, axis=1)

        # After Loyalty and Financial_Status are created, calculate their percentiles
        self.calculate_final_percentiles()

        # Assign segment based on loyalty and financial status
        self.df['Segment'] = self.df.apply(self.assign_segment, axis=1)

        return self.df[['CLIENTNUM','Loyalty','Financial_Status','Segment', 'Digital_Capability']]

    def predict(self, new_data):
        # Apply segmentation logic to new data
        new_data['Digital_Capability'] = new_data.apply(self.digital_capability, axis=1)
        new_data['Financial_Status'] = new_data.apply(self.financial_status, axis=1)
        new_data['Loyalty'] = new_data.apply(self.loyalty_score, axis=1)

        # Calculate percentiles for Loyalty and Financial Status based on new data
        self.calculate_final_percentiles()

        new_data['Segment'] = new_data.apply(self.assign_segment, axis=1)
        return new_data[['CLIENTNUM','Loyalty','Financial_Status','Segment', 'Digital_Capability']]

## 4. Testing 

In [4]:
# Initialize and test with final_df
segmentation_test = CustomerSegmentation(final_df)
segmentation_result = segmentation_test.perform_segmentation()

# Output first few rows and value counts for segmentation
print(segmentation_result.head())
print(segmentation_result['Segment'].value_counts())

   CLIENTNUM  Loyalty  Financial_Status  \
0  768805383        4                 7   
1  818770008        3                 3   
2  713982108        3                 5   
3  709106358        3                 5   
4  713061558        2                 4   

                                             Segment  Digital_Capability  
0  Moderate or Low Financial status, Moderate or ...                True  
1                  Low Financial status, Low Loyalty                True  
2  Moderate or Low Financial status, Moderate or ...                True  
3  Moderate or Low Financial status, Moderate or ...                True  
4  Moderate or Low Financial status, Moderate or ...                True  
Segment
Moderate or Low Financial status, Moderate or Low Loyalty    5288
High Financial status, Low or Moderate Loyalty                910
Low or Moderate Financial status, High Loyalty                513
Low Financial status, Low Loyalty                             294
High Financial stat

# 5. Other consideration

We will do some recoding of the segment so that the result can be used more easily by other models from our project.

1. **Low Financial status, Low Loyalty** = 1
  
2. **Moderate or low Financial status, Moderate or low Loyalty** = 2

3. **Low or Moderate Financial status, High Loyalty** = 3

4. **High Financial status, Low or Moderate Loyalty** = 4
   
5. **High Financial status, High Loyalty** = 5

In [5]:
# Define the mapping for the segments
segment_mapping = {
    "Low Financial status, Low Loyalty": 1,
    "Moderate or Low Financial status, Moderate or Low Loyalty": 2,
    "Low or Moderate Financial status, High Loyalty": 3,
    "High Financial status, Low or Moderate Loyalty": 4,
    "High Financial status, High Loyalty": 5
}

# Apply the mapping to the SEGMENT column in segmentation_result using .loc
segmentation_result.loc[:, 'Segment'] = segmentation_result['Segment'].map(segment_mapping)

# Display the updated segmentation_result DataFrame
segmentation_result.head()

Unnamed: 0,CLIENTNUM,Loyalty,Financial_Status,Segment,Digital_Capability
0,768805383,4,7,2,True
1,818770008,3,3,1,True
2,713982108,3,5,2,True
3,709106358,3,5,2,True
4,713061558,2,4,2,True


In [6]:
# saving the result
segmentation_result.to_csv("../../data/processed/segementation_result_static.csv", index = False)