In [63]:
from faker import Faker
import numpy as np
import pandas as pd
from datetime import datetime,timedelta

In [64]:
# Making use of the synthetic data generated
final_df = pd.read_csv("../../data/processed/banking_behaviour_preference.csv")

# Part 1: Construction of Dynamic model

Consider the following scenarios, existing customer may be re-evaluated which cause their credit-score to change over time. Their saving, outstanding loan and income will not likely to be static too. Therefore, the existing data will likely to have changes. In addtion, there will be new customer joining the bank and some old customer terminating their account. Hence, we will need a dynamic model to take such changes into consideration.

## Feature engineering

To better illustrate the changes across time, we will add in a new fewture `Time` to keep track of when the data has been updated. We assume that the re-evaluation happens **every week Friday**. For simplicity sake, we will set the first batch of data to be **05/01/2024** which is the first Friday of 2024.

In [65]:
start_date = pd.to_datetime("2024-01-01")  # Starting date, First friday of 2024
final_df.insert(1, 'Time', start_date)

## 1.1 DataManager Class for Customer Data Management

### Overview
The `DataManager` class is designed to handle the management of customer data within a banking context. It allows for the addition of new customer data, updating existing records, and the removal of churned customers. This class is essential for maintaining an up-to-date customer database that reflects current information accurately.

### Features
- **Updating Records**: If a customer already exists in the current dataset (identified by `CLIENTNUM`), the class will replace the old record with the new data provided.
- **Appending New Records**: If a customer does not exist in the current dataset, the class will append the new record to the dataset.
- **Handling Churn**: The class can process a new column called `Churned`. If this value is `1`, indicating the customer has churned, the corresponding record will be removed from the current dataset.
- **Getter Function**: A method to retrieve the current state of the dataset after any updates.

In [66]:
class DataManager:
    def __init__(self, df):
        self.df = df.copy()

    def add_data(self, new_data):
        # Set current date (Friday) for new entries
        current_time = pd.to_datetime('now').normalize()  # Set current time to the day, ignoring seconds
        new_data['Time'] = current_time  # Apply current date

        # Set CLIENTNUM as index for easy merging
        new_data.set_index('CLIENTNUM', inplace=True)

        # Remove churned customers from the current DataFrame
        churned_customers = new_data[new_data['Churned'] == 1].index
        self.df = self.df[~self.df['CLIENTNUM'].isin(churned_customers)]  # Remove churned customers

        # Set CLIENTNUM as index for the current dataframe
        self.df.set_index('CLIENTNUM', inplace=True)
        
        # Update existing records, ignoring churned customers
        self.df.update(new_data[new_data['Churned'] == 0])  
        
        # Append new records that don't exist in the current dataframe
        self.df = self.df.combine_first(new_data[new_data['Churned'] == 0])  

        # Reset index to return CLIENTNUM as a column
        self.df.reset_index(inplace=True)

        # Reorder columns to ensure 'Time' is right after 'CLIENTNUM'
        cols = list(self.df.columns)
        cols.insert(1, cols.pop(cols.index('Time')))  # Move 'Time' to right after 'CLIENTNUM'
        self.df = self.df[cols]

    def get(self):
        return self.df

## 1.2 PercentileCalculator Class for Feature Engineering and Percentile Calculation

### Overview
The `PercentileCalculator` class calculates percentiles for specified customer attributes and derives essential features such as financial status, loyalty, and digital capability. This class enables categorization based on percentile thresholds and prepares data for downstream segmentation.

### Features
- **Percentile Calculation**: Computes percentiles (20th, 50th, 80th) for columns such as `Credit Score`, `Outstanding Loans`, `Balance`, `Total Transaction Amount`, and `Total Transaction Count`.
- **Digital Capability Calculation**: Determines if a customer is digitally capable based on indicators like phone service, internet service, tech support, and payment method.
- **Financial Status Scoring**: Calculates a financial status score that incorporates income category and percentile-based assessments for credit score, outstanding loans, and balance.
- **Loyalty Scoring**: Assigns a loyalty score based on total transaction amount, transaction count, and product usage.
- **Categorization**: Converts financial status and loyalty scores into categorical labels (`Low`, `Moderate`, `High`) using 20th and 80th percentiles.
- **Getter Method**: Returns a DataFrame containing `CLIENTNUM`, `Time`, `Financial_Status_Category`, `Loyalty_Category`, and `Digital_Capability` for use in segmentation.


In [67]:
class PercentileCalculator:
    def __init__(self, df):
        # Initialize with existing data
        self.df = df.copy()
        self.percentiles = {}

    def calculate_percentiles(self):
        # Calculate percentiles for the required columns
        self.percentiles['Credit_Score'] = self.df['Credit Score'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Outstanding_Loans'] = self.df['Outstanding Loans'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Balance'] = self.df['Balance'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Total_Trans_Amt'] = self.df['Total_Trans_Amt'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Total_Trans_Count'] = self.df['Total_Trans_Count'].quantile([0.2, 0.5, 0.8])

    def calculate_digital_capability(self, row):
        score = 0
        score += row['PhoneService']
        score += 1 if row['InternetService'] in [0, 1] else 0
        score += 1 if row['TechSupport'] == 2 else 0
        score += row['PaperlessBilling']
        score += 2 if row['PaymentMethod'] in [0, 1] else 1 if row['PaymentMethod'] == 2 else 0
        return True if score > 2 else False  # Return True for digitally capable, False for not capable

    def calculate_financial_status(self, row):
        score = 0
        # Income Category (strict rules)
        if row['Income_Category'] == '120 +':
            score += 3
        elif row['Income_Category'] == '80 - 120':
            score += 2
        elif row['Income_Category'] == '60 - 80':
            score += 1

        # Credit Score (percentile-based)
        if row['Credit Score'] > self.percentiles['Credit_Score'][0.8]:
            score += 3
        elif row['Credit Score'] > self.percentiles['Credit_Score'][0.5]:
            score += 2
        elif row['Credit Score'] > self.percentiles['Credit_Score'][0.2]:
            score += 1

        # Outstanding Loans (percentile-based)
        if row['Outstanding Loans'] < self.percentiles['Outstanding_Loans'][0.2]:
            score += 3
        elif row['Outstanding Loans'] < self.percentiles['Outstanding_Loans'][0.5]:
            score += 2
        elif row['Outstanding Loans'] < self.percentiles['Outstanding_Loans'][0.8]:
            score += 1

        # Balance (percentile-based)
        if row['Balance'] > self.percentiles['Balance'][0.8]:
            score += 3
        elif row['Balance'] > self.percentiles['Balance'][0.5]:
            score += 2
        elif row['Balance'] > self.percentiles['Balance'][0.2]:
            score += 1

        return score

    def calculate_loyalty_score(self, row):
        # Loyalty is a composite score based on:
        # - Total_Trans_Amt: Scaled to a maximum of 3 points.
        # - Total_Trans_Count: Scaled to a maximum of 3 points.
        # - No_of_product: Heavy User (>4 products): 3 points, Moderate User (3-4 products): 2 points, Light User (<=2 products): 1 point

        trans_amt_score = 3 if row['Total_Trans_Amt'] > self.percentiles['Total_Trans_Amt'][0.8] else \
                          2 if row['Total_Trans_Amt'] > self.percentiles['Total_Trans_Amt'][0.5] else \
                          1 if row['Total_Trans_Amt'] > self.percentiles['Total_Trans_Amt'][0.2] else 0

        trans_count_score = 3 if row['Total_Trans_Count'] > self.percentiles['Total_Trans_Count'][0.8] else \
                            2 if row['Total_Trans_Count'] > self.percentiles['Total_Trans_Count'][0.5] else \
                            1 if row['Total_Trans_Count'] > self.percentiles['Total_Trans_Count'][0.2] else 0

        product_usage_score = 3 if row['No_of_product'] > 4 else 2 if 3 <= row['No_of_product'] <= 4 else 1

        return trans_amt_score + trans_count_score + product_usage_score

    def perform_feature_engineering(self):
        # Perform feature engineering to calculate Financial_Status, Loyalty, and Digital_Capability
        self.df['Financial_Status'] = self.df.apply(self.calculate_financial_status, axis=1)
        self.df['Loyalty'] = self.df.apply(self.calculate_loyalty_score, axis=1)
        self.df['Digital_Capability'] = self.df.apply(self.calculate_digital_capability, axis=1)

    def categorize_financial_status_and_loyalty(self):
        # Calculate percentiles for Financial_Status and Loyalty
        loyalty_percentiles = self.df['Loyalty'].quantile([0.2, 0.8])
        financial_status_percentiles = self.df['Financial_Status'].quantile([0.2, 0.8])

        # Categorize Financial_Status
        self.df['Financial_Status_Category'] = self.df['Financial_Status'].apply(
            lambda x: 'Low' if x <= financial_status_percentiles[0.2] else
            ('High' if x > financial_status_percentiles[0.8] else 'Moderate'))

        # Categorize Loyalty
        self.df['Loyalty_Category'] = self.df['Loyalty'].apply(
            lambda x: 'Low' if x <= loyalty_percentiles[0.2] else
            ('High' if x > loyalty_percentiles[0.8] else 'Moderate'))

    def get_featured_data(self):
        # Perform feature engineering and return a DataFrame with CLIENTNUM, Time, Financial_Status, Loyalty, and Digital_Capability
        self.perform_feature_engineering()
        self.categorize_financial_status_and_loyalty()
        return self.df[['CLIENTNUM','Time','Financial_Status_Category', 'Loyalty_Category', 'Digital_Capability']]

## 1.3 Segmentation Class for Rule-Based Customer Segmentation

### Overview
The `Segmentation` class classifies customers into predefined segments based on their financial status and loyalty categories derived from the `PercentileCalculator`. It consolidates segmentation results with original data, making it useful for targeted marketing strategies.

### Features
- **Segmentation Rules**: Defines segmentation based on combinations of financial status and loyalty categories, including `Low Financial status, Low Loyalty`, `High Financial status, High Loyalty`, and other combinations.
- **Segment Assignment**: Applies segmentation rules and assigns a `SEGMENT` label to each customer.
- **Data Integration**: Merges the segmentation result and digital capability with the original customer dataset.
- **Getter Functions**:
  - `get_segment_result()`: Provides a simplified DataFrame with `CLIENTNUM`, `Time`, `SEGMENT`, and `Digital_Capability`.
  - `get_original_with_segment()`: Returns the full original dataset with appended `SEGMENT` and `Digital_Capability`.


In [68]:
class Segmentation:
    def __init__(self, original_df, featured_df):
        """
        Initialize the Segmentation class with the original and featured data.
        
        :param original_df: The original DataFrame from the DataManager.
        :param featured_df: The DataFrame output from the PercentileCalculator.
        """
        self.original_df = original_df.copy()
        self.featured_df = featured_df.copy()

    def apply_segmentation_rule(self, row):
        """
        Apply segmentation rule based on the Financial_Status_Category and Loyalty_Category.
        
        :param row: A row from the featured_df DataFrame
        :return: Segment as per the predefined rules
        """
        financial_status = row['Financial_Status_Category']
        loyalty = row['Loyalty_Category']

        if financial_status == 'Low' and loyalty == 'Low':
            return 'Low Financial status, Low Loyalty'
        elif financial_status == 'High' and loyalty == 'High':
            return 'High Financial status, High Loyalty'
        elif financial_status == 'High' and loyalty in ['Moderate', 'Low']:
            return 'High Financial status, Low or Moderate Loyalty'
        elif financial_status in ['Moderate', 'Low'] and loyalty == 'High':
            return 'Low or Moderate Financial status, High Loyalty'
        else:
            return 'Moderate or Low Financial status, Moderate or Low Loyalty'

    def perform_segmentation(self):
        """
        Perform segmentation and append the SEGMENT column.
        """
        # Apply segmentation rule
        self.featured_df['SEGMENT'] = self.featured_df.apply(self.apply_segmentation_rule, axis=1)
        
        # Combine SEGMENT and DIGITAL_CAPABILITY from featured_df into the original_df
        self.original_df = self.original_df.merge(
            self.featured_df[['CLIENTNUM', 'SEGMENT','Digital_Capability']],
            on='CLIENTNUM',
            how='left'
        )

        

    def get_segment_result(self):
        """
        Get the result DataFrame with only CLIENTNUM,Time, SEGMENT, and DIGITAL_Capability (True/False).
        
        :return: A DataFrame containing CLIENTNUM, SEGMENT, and Digital_Capability.
        """
        return self.original_df[['CLIENTNUM','Time', 'SEGMENT', 'Digital_Capability']]

    def get_original_with_segment(self):
        """
        Get the original DataFrame with SEGMENT appended.
        
        :return: The original DataFrame with SEGMENT columns.
        """
        return self.original_df


## 1.4 DynamicCustomerSegmentation Class for End-to-End Dynamic Segmentation Workflow

### Overview
The `DynamicCustomerSegmentation` class orchestrates the workflow of updating customer data, recalculating features, and segmenting customers. It integrates the `DataManager`, `PercentileCalculator`, and `Segmentation` classes to streamline data processing, making it suitable for dynamic environments where customer data is regularly updated.

### Features
- **Data Management**: Uses `DataManager` to manage customer data, handle updates, add new records, and remove churned customers.
- **Feature Engineering and Percentile Calculation**: Employs `PercentileCalculator` to recalculate percentiles and derive essential features like financial status, loyalty, and digital capability.
- **Customer Segmentation**: Utilizes `Segmentation` to apply rule-based categorization and generate a segmentation result.
- **Unified Output**:
  - `process_new_data(new_data)`: Processes new customer data and returns a DataFrame with `CLIENTNUM`, `SEGMENT`, and `Digital_Capability`.
  - `get_full_original_with_segment()`: Retrieves the original dataset with appended `SEGMENT` and `Digital_Capability`.

In [69]:
class DynamicCustomerSegmentation:
    def __init__(self, initial_data):
        """
        Initialize the dynamic cluster segmentation with the initial dataset.
        
        :param initial_data: The original dataset, typically the output of DataManager.
        """
        self.data_manager = DataManager(initial_data)

    def process_new_data(self, new_data):
        """
        Process new data through the pipeline.
        
        :param new_data: The new batch of data to be processed.
        :return: A DataFrame with CLIENTNUM, SEGMENT, and DIGITAL_CAPABILITY by default.
        """
        # Step 1: Update the dataset using DataManager
        self.data_manager.add_data(new_data)
        updated_df = self.data_manager.get()

        # Step 2: Recalculate percentiles and perform feature engineering using PercentileCalculator
        percentile_calculator = PercentileCalculator(updated_df)
        percentile_calculator.calculate_percentiles()
        featured_data = percentile_calculator.get_featured_data()

        # Step 3: Perform segmentation using Segmentation class
        segmentation = Segmentation(original_df=updated_df, featured_df=featured_data)
        segmentation.perform_segmentation()

        # Step 4: Return the segmented result
        segmented_result = segmentation.get_segment_result()  # Simplified output with CLIENTNUM, SEGMENT, DIGITAL_CAPABILITY
        return segmented_result

    def get_full_original_with_segment(self):
        """
        Get the full original DataFrame with SEGMENT and DIGITAL_CAPABILITY appended.
        
        :return: The original DataFrame with the additional segmentation info.
        """
        segmentation = Segmentation(self.data_manager.get(), self.data_manager.get())  # Dummy instance for the getter
        return segmentation.get_original_with_segment()

    def get_updated_initial_data(self):
        """
        Retrieve the updated initial data from DataManager, reflecting any recent changes.
        
        Returns:
        --------
        pandas.DataFrame
            The latest version of the initial dataset after all updates have been applied.
        """
        return self.data_manager.get()

# Part 2: Simulation of marketing effectiveness and customer satisfaction

We first build a model to generate simulation data

## 2.1 `CustomerDataGenerator` Class Documentation

### Overview

The `CustomerDataGenerator` class is designed to create synthetic customer data with configurable parameters for `churn_rate`, `campaign_effectiveness`, and `customer_satisfaction`. This enables the generation of datasets with varying levels of customer engagement and satisfaction based on a controlled set of inputs. The class also allows for a consistent timestamp to be applied to each generated dataset, facilitating time-based comparisons.

### Class Parameters

When initializing the `CustomerDataGenerator`, the following parameters are available:

- **churn_rate** (`float`): Controls the probability that an existing customer will churn. This value should be between `0` and `1`, where `1` means all existing customers will churn, and `0` means no churn.

- **campaign_effectiveness** (`float`): Represents the level of campaign effectiveness on customer behavior. This value ranges from `0` to `1`, where higher values indicate greater effectiveness, resulting in increased values for features such as `No_of_product`, `Total_Trans_Amt`, and `Total_Trans_Count`.

- **customer_satisfaction** (`float`): Represents the level of customer satisfaction. Similar to `campaign_effectiveness`, this value ranges from `0` to `1`. Higher values influence customer behavior positively, particularly for features related to product usage and transaction frequency.

### Methods

`generate_data(existing_df, num_records, timestamp=None)`

The main method to generate synthetic customer data.

##### Parameters:
- **existing_df** (`pd.DataFrame`): The original customer data from which existing clients are sampled. This dataset should contain a unique identifier column named `CLIENTNUM`.
- **num_records** (`int`): The total number of synthetic records to generate. The method will randomly divide this number between existing and new customers.
- **timestamp** (`datetime` or `str`): A fixed timestamp to apply to all generated records. This parameter allows time-based differentiation between datasets.

##### Returns:
- **all_data** (`pd.DataFrame`): A DataFrame containing the generated data, with columns:
  - `CLIENTNUM`: Unique identifier for each customer.
  - `Income_Category`: Encoded as integers from 0 to 4.
  - `No_of_product`: Number of products used by the customer, adjusted based on effectiveness and satisfaction levels.
  - `Total_Trans_Amt`: Total transaction amount, scaled by effectiveness and satisfaction.
  - `Total_Trans_Count`: Total count of transactions, also influenced by effectiveness and satisfaction.
  - Additional columns representing customer attributes and behaviors.
  - `Churned`: Binary indicator where `1` represents a churned customer and `0` represents a retained customer.
  - `Time`: Timestamp indicating when the data was generated, set to the provided `timestamp` argument.

In [70]:
class CustomerDataGenerator:
    def __init__(self, churn_rate=0.1, campaign_effectiveness=0.5, customer_satisfaction=0.5):
        """
        Initialize the generator with churn rate, campaign effectiveness, and customer satisfaction.
        
        :param churn_rate: Probability of churn for existing clients
        :param campaign_effectiveness: Scale factor (0-1) for campaign's effectiveness on features
        :param customer_satisfaction: Scale factor (0-1) for customer satisfaction effect on features
        """
        self.churn_rate = churn_rate
        self.campaign_effectiveness = campaign_effectiveness
        self.customer_satisfaction = customer_satisfaction
        self.fake = Faker()
    
    def _adjust_based_on_campaign(self, value, max_increase):
        """Adjust values based on campaign effectiveness and customer satisfaction."""
        increase_factor = 1 + self.campaign_effectiveness * self.customer_satisfaction  # Value between 1 and 2
        return min(int(value * increase_factor), max_increase)

    def generate_data(self, existing_df, num_records, timestamp=None):
        """
        Generate synthetic customer data with a fixed timestamp.
        
        :param existing_df: DataFrame containing existing customer data
        :param num_records: Total number of records to generate
        :param timestamp: Fixed datetime to apply to all records in the generated data
        :return: DataFrame with generated customer data
        """
        # Set the timestamp to the current datetime if none is provided
        if timestamp is None:
            timestamp = datetime.now()

        # Get existing client numbers
        existing_clients = existing_df['CLIENTNUM'].values

        # Determine number of existing and new records
        num_existing = int(np.random.uniform(0.0, 0.8) * num_records)
        num_existing = min(num_existing, len(existing_clients))  # Ensure we do not exceed the actual number
        num_new = num_records - num_existing

        # Select random existing clients for updates
        updated_client_nums = np.random.choice(existing_clients, size=num_existing, replace=False)
        new_client_nums = np.arange(existing_clients.max() + 1, existing_clients.max() + 1 + num_new)

        # Create updated records for existing clients
        updated_data = []
        for client in updated_client_nums:
            updated_data.append({
                'CLIENTNUM': client,
                'Income_Category': self.fake.random_int(min=0, max=4),  # Income category as integer
                'No_of_product': self._adjust_based_on_campaign(self.fake.random_int(min=1, max=3), 6),
                'Total_Trans_Amt': self._adjust_based_on_campaign(self.fake.random_int(min=500, max=2000), 10000),
                'Total_Trans_Count': self._adjust_based_on_campaign(self.fake.random_int(min=10, max=50), 150),
                'Credit Score': self.fake.random_int(min=300, max=850),
                'Outstanding Loans': self.fake.random_int(min=0, max=50000),
                'Balance': self.fake.random_int(min=0, max=300000),
                'PhoneService': self.fake.random_int(min=0, max=1),
                'InternetService': self.fake.random_int(min=0, max=2),
                'TechSupport': self.fake.random_int(min=0, max=2),
                'PaperlessBilling': self.fake.random_int(min=0, max=1),
                'PaymentMethod': self.fake.random_int(min=0, max=3),
                'Churned': 0,  # Initially set as not churned
                'Time': timestamp  # Fixed timestamp for each record
            })
        
        # Apply churn rate only to the updated (existing) customers
        num_churned = int(self.churn_rate * num_existing)
        churned_clients = np.random.choice(range(num_existing), size=num_churned, replace=False)
        
        for i in churned_clients:
            updated_data[i]['Churned'] = 1  # Mark these clients as churned

        # Create new customer records (with no churn)
        new_data = []
        for client in new_client_nums:
            new_data.append({
                'CLIENTNUM': client,
                'Income_Category': self.fake.random_int(min=0, max=4),  # Income category as integer
                'No_of_product': self._adjust_based_on_campaign(self.fake.random_int(min=1, max=3), 6),
                'Total_Trans_Amt': self._adjust_based_on_campaign(self.fake.random_int(min=500, max=2000), 10000),
                'Total_Trans_Count': self._adjust_based_on_campaign(self.fake.random_int(min=10, max=50), 150),
                'Credit Score': self.fake.random_int(min=300, max=850),
                'Outstanding Loans': self.fake.random_int(min=0, max=50000),
                'Balance': self.fake.random_int(min=0, max=300000),
                'PhoneService': self.fake.random_int(min=0, max=1),
                'InternetService': self.fake.random_int(min=0, max=2),
                'TechSupport': self.fake.random_int(min=0, max=2),
                'PaperlessBilling': self.fake.random_int(min=0, max=1),
                'PaymentMethod': self.fake.random_int(min=0, max=3),
                'Churned': 0,  # New customers are not churned
                'Time': timestamp  # Fixed timestamp for each record
            })
        
        # Combine updated and new data into one DataFrame
        all_data = pd.DataFrame(updated_data + new_data)

        return all_data

## 2.2 Data generation and actual simulation

We will first make use of `CustomerDataGenerator` to synthsise data for high customer satisfaction and high campaign effectiveness

In [71]:
high_generator = CustomerDataGenerator(churn_rate=0.1, campaign_effectiveness=0.9, customer_satisfaction=0.9)
initial_date = datetime(2024, 1, 1)
high_timestamp = initial_date + timedelta(weeks = 1)

# Generate synthetic data with high effectiveness and satisfaction
high_data = high_generator.generate_data(final_df, num_records=10000, timestamp=high_timestamp)

Then, we will make use of `DynamicCustomerSegmentation` to update our clusters

In [72]:
# Initialize with the initial dataset (final_df)
dynamic_segmentation = DynamicCustomerSegmentation(initial_data=final_df)

# Process new data and get the segmented result
segmented_result = dynamic_segmentation.process_new_data(high_data)

# Optionally, get the full original data with SEGMENT and DIGITAL_CAPABILITY appended
full_data_with_segments = dynamic_segmentation.get_full_original_with_segment()
updated = dynamic_segmentation.get_updated_initial_data()

# Display the segmented result
print(segmented_result)

        CLIENTNUM       Time  \
0            7239 2024-01-01   
1           47140 2024-01-01   
2           50411 2024-01-01   
3           64671 2024-01-01   
4           65296 2024-01-01   
...           ...        ...   
113943  999983290 2024-11-12   
113944  999983291 2024-11-12   
113945  999983292 2024-11-12   
113946  999983293 2024-11-12   
113947  999983294 2024-11-12   

                                                  SEGMENT  Digital_Capability  
0          High Financial status, Low or Moderate Loyalty                True  
1       Moderate or Low Financial status, Moderate or ...               False  
2       Moderate or Low Financial status, Moderate or ...                True  
3                       Low Financial status, Low Loyalty                True  
4       Moderate or Low Financial status, Moderate or ...                True  
...                                                   ...                 ...  
113943  Moderate or Low Financial status, Moderate or .

Now, we will generate data where customer satifaction and campaign effectiveness are low

In [73]:
low_generator = CustomerDataGenerator(churn_rate=0.1, campaign_effectiveness=0.1, customer_satisfaction=0.1)
low_timestamp = initial_date + timedelta(weeks=2)
# Generate synthetic data with low effectiveness and satisfaction
low_data = low_generator.generate_data(updated, num_records=8000, timestamp=low_timestamp)

Then, we will update our cluster again

In [74]:
# Initialize with the initial dataset (final_df)
dynamic_segmentation = DynamicCustomerSegmentation(initial_data=updated)

# Process new data (simulation_1) and get the segmented result
segmented_result1 = dynamic_segmentation.process_new_data(low_data)

# Optionally, get the full original data with SEGMENT and DIGITAL_CAPABILITY appended
full_data_with_segments1 = dynamic_segmentation.get_full_original_with_segment()
updated1 = dynamic_segmentation.get_updated_initial_data()

# Display the segmented result
print(segmented_result)

        CLIENTNUM       Time  \
0            7239 2024-01-01   
1           47140 2024-01-01   
2           50411 2024-01-01   
3           64671 2024-01-01   
4           65296 2024-01-01   
...           ...        ...   
113943  999983290 2024-11-12   
113944  999983291 2024-11-12   
113945  999983292 2024-11-12   
113946  999983293 2024-11-12   
113947  999983294 2024-11-12   

                                                  SEGMENT  Digital_Capability  
0          High Financial status, Low or Moderate Loyalty                True  
1       Moderate or Low Financial status, Moderate or ...               False  
2       Moderate or Low Financial status, Moderate or ...                True  
3                       Low Financial status, Low Loyalty                True  
4       Moderate or Low Financial status, Moderate or ...                True  
...                                                   ...                 ...  
113943  Moderate or Low Financial status, Moderate or .