In [1]:
from datetime import datetime, timedelta

import numpy as np
import pandas as pd
from faker import Faker

In [2]:
# Making use of the synthetic data generated
final_df = pd.read_csv("../../data/processed/banking_behaviour_preference.csv")

# Part 1: Construction of Dynamic model

Consider the following scenarios, existing customer may be re-evaluated which cause their credit-score to change over time. Their saving, outstanding loan and income will not likely to be static too. Therefore, the existing data will likely to have changes. In addtion, there will be new customer joining the bank and some old customer terminating their account. Hence, we will need a dynamic model to take such changes into consideration.

## Feature engineering

To better illustrate the changes across time, we will add in a new fewture `Time` to keep track of when the data has been updated. We assume that the re-evaluation happens **every week Friday**. For simplicity sake, we will set the first batch of data to be **05/01/2024** which is the first Friday of 2024.

In [3]:
start_date = pd.to_datetime("2024-01-01")  # Starting date, First friday of 2024
final_df.insert(1, 'Time', start_date)

## 1.1 DataManager Class for Customer Data Management

### Overview
The `DataManager` class is designed to handle the management of customer data within a banking context. It allows for the addition of new customer data, updating existing records, and the removal of churned customers. This class is essential for maintaining an up-to-date customer database that reflects current information accurately.

### Features
- **Updating Records**: If a customer already exists in the current dataset (identified by `CLIENTNUM`), the class will replace the old record with the new data provided.
- **Appending New Records**: If a customer does not exist in the current dataset, the class will append the new record to the dataset.
- **Handling Churn**: The class can process a new column called `Churned`. If this value is `1`, indicating the customer has churned, the corresponding record will be removed from the current dataset.
- **Getter Function**: A method to retrieve the current state of the dataset after any updates.

In [4]:
class DataManager:
    def __init__(self, df):
        self.df = df.copy()

    def add_data(self, new_data):
        if 'CLIENTNUM' not in new_data.columns:
            raise KeyError("CLIENTNUM column missing in new_data.")
        
        # Remove churned customers in new_data
        self.df = self.df[~self.df['CLIENTNUM'].isin(new_data[new_data['Churned'] == 1]['CLIENTNUM'])]
        
        new_data = new_data[new_data['Churned'] == 0].drop_duplicates(subset='CLIENTNUM')
        
        # Update existing records
        self.df.set_index('CLIENTNUM', inplace=True)
        new_data.set_index('CLIENTNUM', inplace=True)
        self.df.update(new_data)
        
        # Append new records
        new_clients = new_data[~new_data.index.isin(self.df.index)]
        self.df = pd.concat([self.df, new_clients])
        self.df.reset_index(inplace=True)

    def get(self):
        return self.df

## 1.2 PercentileCalculator Class for Feature Engineering and Percentile Calculation

### Overview
The `PercentileCalculator` class calculates percentiles for specified customer attributes and derives essential features such as financial status, loyalty, and digital capability. This class enables categorization based on percentile thresholds and prepares data for downstream segmentation.

### Features
- **Percentile Calculation**: Computes percentiles (20th, 50th, 80th) for columns such as `Credit Score`, `Outstanding Loans`, `Balance`, `Total Transaction Amount`, and `Total Transaction Count`.
- **Digital Capability Calculation**: Determines if a customer is digitally capable based on indicators like phone service, internet service, tech support, and payment method.
- **Financial Status Scoring**: Calculates a financial status score that incorporates income category and percentile-based assessments for credit score, outstanding loans, and balance.
- **Loyalty Scoring**: Assigns a loyalty score based on total transaction amount, transaction count, and product usage.
- **Categorization**: Converts financial status and loyalty scores into categorical labels (`Low`, `Moderate`, `High`) using 20th and 80th percentiles.
- **Getter Method**: Returns a DataFrame containing `CLIENTNUM`, `Time`, `Financial_Status_Category`, `Loyalty_Category`, and `Digital_Capability` for use in segmentation.


In [5]:
class PercentileCalculator:
    def __init__(self, df):
        # Initialize with existing data
        self.df = df.copy()
        self.percentiles = {}

    def calculate_percentiles(self):
        # Calculate percentiles for the required columns
        self.percentiles['Credit_Score'] = self.df['Credit Score'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Outstanding_Loans'] = self.df['Outstanding Loans'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Balance'] = self.df['Balance'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Total_Trans_Amt'] = self.df['Total_Trans_Amt'].quantile([0.2, 0.5, 0.8])
        self.percentiles['Total_Trans_Count'] = self.df['Total_Trans_Count'].quantile([0.2, 0.5, 0.8])

    def calculate_digital_capability(self, row):
        score = 0
        score += row['PhoneService']
        score += 1 if row['InternetService'] in [0, 1] else 0
        score += 1 if row['TechSupport'] == 2 else 0
        score += row['PaperlessBilling']
        score += 2 if row['PaymentMethod'] in [0, 1] else 1 if row['PaymentMethod'] == 2 else 0
        return True if score > 2 else False  # Return True for digitally capable, False for not capable

    def calculate_financial_status(self, row):
        score = 0
        # Income Category (strict rules)
        if row['Income_Category'] == '120 +':
            score += 3
        elif row['Income_Category'] == '80 - 120':
            score += 2
        elif row['Income_Category'] == '60 - 80':
            score += 1

        # Credit Score (percentile-based)
        if row['Credit Score'] > self.percentiles['Credit_Score'][0.8]:
            score += 3
        elif row['Credit Score'] > self.percentiles['Credit_Score'][0.5]:
            score += 2
        elif row['Credit Score'] > self.percentiles['Credit_Score'][0.2]:
            score += 1

        # Outstanding Loans (percentile-based)
        if row['Outstanding Loans'] < self.percentiles['Outstanding_Loans'][0.2]:
            score += 3
        elif row['Outstanding Loans'] < self.percentiles['Outstanding_Loans'][0.5]:
            score += 2
        elif row['Outstanding Loans'] < self.percentiles['Outstanding_Loans'][0.8]:
            score += 1

        # Balance (percentile-based)
        if row['Balance'] > self.percentiles['Balance'][0.8]:
            score += 3
        elif row['Balance'] > self.percentiles['Balance'][0.5]:
            score += 2
        elif row['Balance'] > self.percentiles['Balance'][0.2]:
            score += 1

        return score

    def calculate_loyalty_score(self, row):
        # Loyalty is a composite score based on:
        # - Total_Trans_Amt: Scaled to a maximum of 3 points.
        # - Total_Trans_Count: Scaled to a maximum of 3 points.
        # - No_of_product: Heavy User (>4 products): 3 points, Moderate User (3-4 products): 2 points, Light User (<=2 products): 1 point

        trans_amt_score = 3 if row['Total_Trans_Amt'] > self.percentiles['Total_Trans_Amt'][0.8] else \
                          2 if row['Total_Trans_Amt'] > self.percentiles['Total_Trans_Amt'][0.5] else \
                          1 if row['Total_Trans_Amt'] > self.percentiles['Total_Trans_Amt'][0.2] else 0

        trans_count_score = 3 if row['Total_Trans_Count'] > self.percentiles['Total_Trans_Count'][0.8] else \
                            2 if row['Total_Trans_Count'] > self.percentiles['Total_Trans_Count'][0.5] else \
                            1 if row['Total_Trans_Count'] > self.percentiles['Total_Trans_Count'][0.2] else 0

        product_usage_score = 3 if row['No_of_product'] > 4 else 2 if 3 <= row['No_of_product'] <= 4 else 1

        return trans_amt_score + trans_count_score + product_usage_score

    def perform_feature_engineering(self):
        # Perform feature engineering to calculate Financial_Status, Loyalty, and Digital_Capability
        self.df['Financial_Status'] = self.df.apply(self.calculate_financial_status, axis=1)
        self.df['Loyalty'] = self.df.apply(self.calculate_loyalty_score, axis=1)
        self.df['Digital_Capability'] = self.df.apply(self.calculate_digital_capability, axis=1)

    def categorize_financial_status_and_loyalty(self):
        # Calculate percentiles for Financial_Status and Loyalty
        loyalty_percentiles = self.df['Loyalty'].quantile([0.2, 0.8])
        financial_status_percentiles = self.df['Financial_Status'].quantile([0.2, 0.8])

        # Categorize Financial_Status
        self.df['Financial_Status_Category'] = self.df['Financial_Status'].apply(
            lambda x: 'Low' if x <= financial_status_percentiles[0.2] else
            ('High' if x > financial_status_percentiles[0.8] else 'Moderate'))

        # Categorize Loyalty
        self.df['Loyalty_Category'] = self.df['Loyalty'].apply(
            lambda x: 'Low' if x <= loyalty_percentiles[0.2] else
            ('High' if x > loyalty_percentiles[0.8] else 'Moderate'))

    def get_featured_data(self):
        # Perform feature engineering and return a DataFrame with CLIENTNUM, Time, Financial_Status, Loyalty, and Digital_Capability
        self.perform_feature_engineering()
        self.categorize_financial_status_and_loyalty()
        return self.df[['CLIENTNUM','Time','Financial_Status_Category', 'Loyalty_Category', 'Digital_Capability']]

## 1.3 Segmentation Class for Rule-Based Customer Segmentation

### Overview
The `Segmentation` class classifies customers into predefined segments based on their financial status and loyalty categories derived from the `PercentileCalculator`. It consolidates segmentation results with original data, making it useful for targeted marketing strategies.

### Features
- **Segmentation Rules**: Defines segmentation based on combinations of financial status and loyalty categories, including `Low Financial status, Low Loyalty`, `High Financial status, High Loyalty`, and other combinations.
- **Segment Assignment**: Applies segmentation rules and assigns a `SEGMENT` label to each customer.
- **Data Integration**: Merges the segmentation result and digital capability with the original customer dataset.
- **Getter Functions**:
  - `get_segment_result()`: Provides a simplified DataFrame with `CLIENTNUM`, `Time`, `SEGMENT`, and `Digital_Capability`.
  - `get_original_with_segment()`: Returns the full original dataset with appended `SEGMENT` and `Digital_Capability`.


In [6]:
class Segmentation:
    def __init__(self, original_df, featured_df):
        """
        Initialize the Segmentation class with the original and featured data.
        
        :param original_df: The original DataFrame from the DataManager.
        :param featured_df: The DataFrame output from the PercentileCalculator.
        """
        self.original_df = original_df.copy()
        self.featured_df = featured_df.copy()

    def apply_segmentation_rule(self, row):
        """
        Apply segmentation rule based on the Financial_Status_Category and Loyalty_Category.
        
        :param row: A row from the featured_df DataFrame
        :return: Segment as per the predefined rules
        """
        financial_status = row['Financial_Status_Category']
        loyalty = row['Loyalty_Category']

        if financial_status == 'Low' and loyalty == 'Low':
            return 'Minimal Engagers'
        elif financial_status == 'High' and loyalty == 'High':
            return 'Minimal Engagers'
        elif financial_status == 'High' and loyalty in ['Moderate', 'Low']:
            return 'Affluent Observers'
        elif financial_status in ['Moderate', 'Low'] and loyalty == 'High':
            return 'Loyal Savers'
        else:
            return 'Casual Browsers'

    def perform_segmentation(self):
        """
        Perform segmentation and append the SEGMENT column.
        """
        # Apply segmentation rule
        self.featured_df['SEGMENT'] = self.featured_df.apply(self.apply_segmentation_rule, axis=1)
        
        # Combine SEGMENT and DIGITAL_CAPABILITY from featured_df into the original_df
        self.original_df = self.original_df.merge(
            self.featured_df[['CLIENTNUM', 'SEGMENT','Digital_Capability']],
            on='CLIENTNUM',
            how='left'
        )

        

    def get_segment_result(self):
        """
        Get the result DataFrame with only CLIENTNUM,Time, SEGMENT, and DIGITAL_Capability (True/False).
        
        :return: A DataFrame containing CLIENTNUM, SEGMENT, and Digital_Capability.
        """
        return self.original_df[['CLIENTNUM','Time', 'SEGMENT', 'Digital_Capability']]

    def get_original_with_segment(self):
        """
        Get the original DataFrame with SEGMENT appended.
        
        :return: The original DataFrame with SEGMENT columns.
        """
        return self.original_df


## 1.4 DynamicCustomerSegmentation Class for End-to-End Dynamic Segmentation Workflow

### Overview
The `DynamicCustomerSegmentation` class orchestrates the workflow of updating customer data, recalculating features, and segmenting customers. It integrates the `DataManager`, `PercentileCalculator`, and `Segmentation` classes to streamline data processing, making it suitable for dynamic environments where customer data is regularly updated.

### Features
- **Data Management**: Uses `DataManager` to manage customer data, handle updates, add new records, and remove churned customers.
- **Feature Engineering and Percentile Calculation**: Employs `PercentileCalculator` to recalculate percentiles and derive essential features like financial status, loyalty, and digital capability.
- **Customer Segmentation**: Utilizes `Segmentation` to apply rule-based categorization and generate a segmentation result.
- **Unified Output**:
  - `process_new_data(new_data)`: Processes new customer data and returns a DataFrame with `CLIENTNUM`, `SEGMENT`, and `Digital_Capability`.
  - `get_full_original_with_segment()`: Retrieves the original dataset with appended `SEGMENT` and `Digital_Capability`.

In [7]:
class DynamicCustomerSegmentation:
    """
    DynamicCustomerSegmentation class orchestrates managing, updating, and segmenting customer data.
    It integrates data management, percentile calculation, and segmentation functionality, 
    allowing for real-time updates with each new data batch.
    
    Attributes:
    ----------
    data_manager : DataManager
        Instance of DataManager for managing the primary customer dataset.
    percentile_calculator : PercentileCalculator
        Instance of PercentileCalculator for handling percentile calculations.
    segmentation : Segmentation
        Instance of Segmentation for customer segmentation logic.
    
    updated_df : pd.DataFrame
        The current state of the updated customer data.
    featured_data : pd.DataFrame
        DataFrame containing engineered features for segmentation.
    segmented_result : pd.DataFrame
        Final segmented data with CLIENTNUM, SEGMENT, and DIGITAL_CAPABILITY.
    """
    
    def __init__(self, initial_data):
        """
        Initializes the dynamic segmentation system with the initial dataset.
        
        Parameters:
        ----------
        initial_data : pd.DataFrame
            Initial customer dataset to initialize DataManager and other components.
        """
        self.data_manager = DataManager(initial_data)
        self.updated_df = self.data_manager.get()  # Store updated data as an attribute
        self.percentile_calculator = None
        self.featured_data = None
        self.segmentation = None
        self.segmented_result = None

    def process_new_data(self, new_data):
        """
        Processes new customer data by updating the dataset, recalculating features, 
        and performing segmentation. Stores the results as attributes for easy access.

        Parameters:
        ----------
        new_data : pd.DataFrame
            New data batch to integrate with the existing dataset.
        
        Updates:
        --------
        - self.updated_df : Latest updated data from DataManager.
        - self.featured_data : DataFrame with calculated features.
        - self.segmented_result : Segmented data with CLIENTNUM, SEGMENT, DIGITAL_CAPABILITY.
        """
        # Step 1: Update the dataset using DataManager
        self.data_manager.add_data(new_data)
        self.updated_df = self.data_manager.get()  # Updated data stored in updated_df
        
        # Step 2: Calculate percentiles and perform feature engineering
        self.percentile_calculator = PercentileCalculator(self.updated_df)
        self.percentile_calculator.calculate_percentiles()
        self.featured_data = self.percentile_calculator.get_featured_data()
        
        # Step 3: Perform segmentation and store the result
        self.segmentation = Segmentation(original_df=self.updated_df, featured_df=self.featured_data)
        self.segmentation.perform_segmentation()
        self.segmented_result = self.segmentation.get_segment_result()  # Simplified output

    def get_segmented_result(self):
        """
        Retrieves the latest segmented result.
        
        Returns:
        --------
        pd.DataFrame:
            The DataFrame containing CLIENTNUM, SEGMENT, and DIGITAL_CAPABILITY.
        """
        return self.segmented_result

    def get_full_original_with_segment(self):
        """
        Retrieves the full original data with the latest segmentation details appended.
        
        Returns:
        --------
        pd.DataFrame:
            Original DataFrame with SEGMENT and DIGITAL_CAPABILITY columns appended.
        """
        return self.segmentation.get_original_with_segment()

    def get_updated_initial_data(self):
        """
        Directly accesses the latest updated data managed by DataManager.
        
        Returns:
        --------
        pd.DataFrame:
            The most current version of the initial dataset after all updates.
        """
        return self.updated_df

# Part 2: Simulation of marketing effectiveness and customer satisfaction

We first build a model to generate simulation data

# CustomerDataGenerator Class

The `CustomerDataGenerator` class is designed to generate and modify synthetic customer data based on configurable churn rates, campaign effectiveness, and customer satisfaction levels. The class supports two main operations: updating existing customer data and generating new customer entries. Each operation applies a timestamp to modified or new data entries, simulating a rolling 7-day increment in time. 

## Attributes

- **churn_rate** (`float`): Probability of churn among existing customers, between 0 and 1, where 1 represents the highest churn rate.
- **campaign_effectiveness** (`float`): A factor influencing customer engagement metrics (e.g., product usage and transaction frequency). Higher values make campaigns more effective, increasing customer engagement.
- **customer_satisfaction** (`float`): A factor that impacts customer satisfaction features, affecting values such as transaction amounts and counts. Higher values indicate greater satisfaction.
- **current** (`pandas.DataFrame`): The base DataFrame representing the current set of customers, used for creating modified or new customer data.
- **new** (`pandas.DataFrame`): A DataFrame containing only the modified rows from existing customers and newly generated customer entries.
- **time** (`datetime`): Tracks the date for data timestamps, starting from `'2024-01-01'` and advancing by 7 days after each reset. This timestamp is applied to all newly generated or modified data entries.

## Methods

- **`update_existing_data()`**: Modifies a random subset of the existing customer data based on `churn_rate`, `campaign_effectiveness`, and `customer_satisfaction`. Only modified rows are added to `new`, each with a timestamp from `self.time`.
  
- **`generate_new_customers()`**: Generates new customer entries influenced by `campaign_effectiveness`, and assigns a unique `CLIENTNUM` for each new entry. The `self.time` timestamp is also applied to each new entry.

- **`reset(new_df)`**: Resets the `current` dataset to a new DataFrame and clears `new`. Advances `self.time` by 7 days to simulate the progression of time.

- **`get_new_data()`**: Returns the `new` DataFrame containing only the modified and newly generated customer data, each entry reflecting updates in customer attributes and the current timestamp.


In [8]:
class CustomerDataGenerator:
    def __init__(self, initial_data, churn_rate, campaign_effectiveness, customer_satisfaction):
        self.churn_rate = churn_rate
        self.campaign_effectiveness = campaign_effectiveness
        self.customer_satisfaction = customer_satisfaction
        self.fake = Faker()
        self.current = initial_data.copy()
        self.new = pd.DataFrame()
        self.time = datetime(2024, 1, 8)

    def _generate_unique_clientnum(self):
        """Generate a unique CLIENTNUM not already in the current dataset."""
        while True:
            clientnum = np.random.randint(1, 999999999)  # Any number up to 9 digits
            if clientnum not in self.current['CLIENTNUM'].values:
                return clientnum

    def _adjust_value(self, value, max_value, min_value=1):
        if self.campaign_effectiveness < 0 or self.customer_satisfaction < 0:
            factor = 1 - abs(self.campaign_effectiveness * 0.2 + self.customer_satisfaction * 0.2)
        else:
            factor = 1 + self.campaign_effectiveness * 0.5 * self.customer_satisfaction * 0.5
        return max(min(int(value * factor), max_value), min_value)

    def update_existing_data(self):
        num_to_update = int(np.random.uniform(0.0, 0.5) * len(self.current))
        num_churned = int(self.churn_rate * num_to_update)
        update_clients = np.random.choice(self.current.index, size=num_to_update, replace=False)
        churned_clients = np.random.choice(update_clients, size=num_churned, replace=False)
        
        modified_data = []
        for idx in update_clients:
            row = self.current.loc[idx].copy()
            row['No_of_product'] = self._adjust_value(row['No_of_product'], 6)
            row['Total_Trans_Amt'] = self._adjust_value(row['Total_Trans_Amt'], 10000)
            row['Total_Trans_Count'] = self._adjust_value(row['Total_Trans_Count'], 150)
            row['Churned'] = 1 if idx in churned_clients else 0
            row['Time'] = self.time
            modified_data.append(row)
        self.new = pd.concat([self.new, pd.DataFrame(modified_data)], ignore_index=True)

    def generate_new_customers(self):
        num_new = int(10000 * (self.campaign_effectiveness * 0.5)) if self.campaign_effectiveness > 0 else 0
        if num_new > 0:
            new_customers = [{
                'CLIENTNUM': self._generate_unique_clientnum(),
                'Income_Category': self.fake.random_int(min=0, max=4),
                'No_of_product': self._adjust_value(self.fake.random_int(min=1, max=3), 6),
                'Total_Trans_Amt': self._adjust_value(self.fake.random_int(min=500, max=2000), 10000),
                'Total_Trans_Count': self._adjust_value(self.fake.random_int(min=10, max=50), 150),
                'Credit Score': self.fake.random_int(min=300, max=850),
                'Outstanding Loans': self.fake.random_int(min=0, max=50000),
                'Balance': self.fake.random_int(min=0, max=300000),
                'PhoneService': self.fake.random_int(min=0, max=1),
                'InternetService': self.fake.random_int(min=0, max=2),
                'TechSupport': self.fake.random_int(min=0, max=2),
                'PaperlessBilling': self.fake.random_int(min=0, max=1),
                'PaymentMethod': self.fake.random_int(min=0, max=3),
                'Churned': 0,
                'Time': self.time
            } for _ in range(num_new)]
            new_data_df = pd.DataFrame(new_customers)
            self.new = pd.concat([self.new, new_data_df], ignore_index=True)

    def reset(self, new_df, churn_rate=0, campaign_effectiveness=0, customer_satisfaction=0):
        self.current = new_df.copy()
        self.new = pd.DataFrame()
        self.time += timedelta(days=7)
        self.churn_rate = min(max(self.churn_rate + churn_rate, 0), 1)
        self.campaign_effectiveness = min(max(self.campaign_effectiveness + campaign_effectiveness, -1), 1)
        self.customer_satisfaction = min(max(self.customer_satisfaction + customer_satisfaction, -1), 1)

    def get_new_data(self):
        self.new.reset_index(drop=True, inplace=True)
        return self.new

## 2.2 Data generation and actual simulation

We will first make use of `CustomerDataGenerator` to synthsise data for high customer satisfaction and high campaign effectiveness. For the first round, we will assume that the campain is very effective(0.8) and the customer satifaction is high (0.8). Churn is low at 0.01

In [9]:
# Initialize the CustomerDataGenerator with `final_df` as the initial dataset
generator = CustomerDataGenerator(final_df, churn_rate=0.01, campaign_effectiveness=0.8, customer_satisfaction=0.8)

# Step 1: Update existing data (select a random subset of existing customers to modify)
generator.update_existing_data()

# Step 2: Generate new customer entries based on campaign effectiveness
generator.generate_new_customers()

# Retrieve the new data (both modified and newly generated)
simulation_1 = generator.get_new_data()
print("Modified and New Customer Data:")
print(simulation_1)

Modified and New Customer Data:
      CLIENTNUM       Time  Income_Category  No_of_product  Total_Trans_Amt  \
0     881021982 2024-01-08                4              3             1281   
1     719464758 2024-01-08                4              5             4894   
2     828652532 2024-01-08                3              4             4713   
3     987432581 2024-01-08                4              3             3757   
4     213336513 2024-01-08                4              3             5262   
...         ...        ...              ...            ...              ...   
6497  833088394 2024-01-08                3              3             1955   
6498  818672710 2024-01-08                1              1             1835   
6499  766369881 2024-01-08                3              1             1344   
6500  294595177 2024-01-08                2              3              810   
6501  129839089 2024-01-08                2              2             2092   

      Total_Trans_C

Then, we will make use of `DynamicCustomerSegmentation` to update our clusters

In [10]:
# Initialize with the initial dataset (final_df)
dynamic_segmentation = DynamicCustomerSegmentation(initial_data=final_df)

# Process new data
dynamic_segmentation.process_new_data(simulation_1)

# Get the segmented result after processing
segmented_result = dynamic_segmentation.get_segmented_result()
updated1 = dynamic_segmentation.get_updated_initial_data()
# Get the counts of each segment
segment_counts = segmented_result['SEGMENT'].value_counts()

# Display the counts of each segment
print(segment_counts)


SEGMENT
Casual Browsers       86354
Affluent Observers    14692
Minimal Engagers       8727
Loyal Savers           4326
Name: count, dtype: int64


Now, we will simulate one more time where the campaign effectiveness(0.9) and customer satisfaction(0.9) are high. Chunrn is low at 0.008.

In [11]:
# Initialize the CustomerDataGenerator with `final_df` as the initial dataset
generator.reset(updated1,0.008,0.9,0.9)

# Step 1: Update existing data (select a random subset of existing customers to modify)
generator.update_existing_data()

# Step 2: Generate new customer entries based on campaign effectiveness
generator.generate_new_customers()

# Retrieve the new data (both modified and newly generated)
simulation_2 = generator.get_new_data()

# Process new data
dynamic_segmentation.process_new_data(simulation_2)

# Get the segmented result after processing
segmented_result2 = dynamic_segmentation.get_segmented_result()
updated2 = dynamic_segmentation.get_updated_initial_data()
# Get the counts of each segment
segment_counts2 = segmented_result2['SEGMENT'].value_counts()

# Display the counts of each segment
print(segment_counts2)


SEGMENT
Casual Browsers       86492
Affluent Observers    14703
Minimal Engagers       9227
Loyal Savers           7775
Name: count, dtype: int64


Now we will try to simulate 2 rounds where the satifaction(0.4, 0.2) and effectiveness(0.4, 0.2) are low. Churn rate are a little higher at 0.02 and 0.025

In [12]:
# round 1
# Reset the generator with updated initial data and new parameters for this round
generator.reset(updated2, churn_rate=0.5, campaign_effectiveness=-0.8, customer_satisfaction=-0.8)

# Step 1: Update existing data (modify a random subset of customers)
generator.update_existing_data()

# Step 2: Generate new customer entries based on lower campaign effectiveness
generator.generate_new_customers()

# Retrieve the modified and new data
simulation_3 = generator.get_new_data()

# Process the new data in the segmentation model
dynamic_segmentation.process_new_data(simulation_3)

# Get the segmented result and updated data for this round
segmented_result3 = dynamic_segmentation.get_segmented_result()
updated3 = dynamic_segmentation.get_updated_initial_data()

# Get the segment counts
segment_counts3 = segmented_result3['SEGMENT'].value_counts()

# Display the segment counts for this round
print(segment_counts3)


SEGMENT
Casual Browsers       66984
Affluent Observers    11397
Minimal Engagers       7260
Loyal Savers           6034
Name: count, dtype: int64


In [13]:
#round 4
# Reset the generator with the most recent data and new parameters
generator.reset(updated3, churn_rate=0.5, campaign_effectiveness=-0.9, customer_satisfaction=-0.9)

# Step 1: Update existing data (modify a random subset of customers)
generator.update_existing_data()

# Step 2: Generate new customer entries with very low campaign effectiveness
generator.generate_new_customers()

# Retrieve the modified and new data
simulation_4 = generator.get_new_data()

# Process the new data in the segmentation model
dynamic_segmentation.process_new_data(simulation_4)

# Get the segmented result and updated data for this final round
segmented_result4 = dynamic_segmentation.get_segmented_result()
updated4 = dynamic_segmentation.get_updated_initial_data()

# Get the segment counts for the final round
segment_counts4 = segmented_result4['SEGMENT'].value_counts()

# Display the segment counts for the final round
print("Segment counts after even lower satisfaction and effectiveness (Round 2):")
print(segment_counts4)

Segment counts after even lower satisfaction and effectiveness (Round 2):
SEGMENT
Casual Browsers       35560
Affluent Observers     6116
Minimal Engagers       3864
Loyal Savers           3230
Name: count, dtype: int64


# Comparing the data

In [14]:
# from static model
base_counts = {
    'Moderate or Low Financial status, Moderate or Low Loyalty': 83945,
    'High Financial status, Low or Moderate Loyalty': 14172,
    'Low Financial status, Low Loyalty': 7543,
    'Low or Moderate Financial status, High Loyalty': 3843,
    'High Financial status, High Loyalty': 621
}

# Convert base counts to a DataFrame and add a "Total" column
base_counts_df = pd.DataFrame(base_counts, index=['Base'])
base_counts_df['Total'] = base_counts_df.sum(axis=1)

# Assuming segment_counts1, segment_counts2, segment_counts3, and segment_counts4 are defined
# Convert each round's segment count to a DataFrame and add a "Total" column for each
segment_counts1_df = segment_counts.to_frame().T
segment_counts1_df.index = ['Round 1']
segment_counts1_df['Total'] = segment_counts1_df.sum(axis=1)

segment_counts2_df = segment_counts2.to_frame().T
segment_counts2_df.index = ['Round 2']
segment_counts2_df['Total'] = segment_counts2_df.sum(axis=1)

segment_counts3_df = segment_counts3.to_frame().T
segment_counts3_df.index = ['Round 3']
segment_counts3_df['Total'] = segment_counts3_df.sum(axis=1)

segment_counts4_df = segment_counts4.to_frame().T
segment_counts4_df.index = ['Round 4']
segment_counts4_df['Total'] = segment_counts4_df.sum(axis=1)

# Combine all DataFrames into one, stacking each round on top of each other
simulation_result = pd.concat([base_counts_df, segment_counts1_df, segment_counts2_df, segment_counts3_df, segment_counts4_df])
print(simulation_result)

         Moderate or Low Financial status, Moderate or Low Loyalty  \
Base                                               83945.0           
Round 1                                                NaN           
Round 2                                                NaN           
Round 3                                                NaN           
Round 4                                                NaN           

         High Financial status, Low or Moderate Loyalty  \
Base                                            14172.0   
Round 1                                             NaN   
Round 2                                             NaN   
Round 3                                             NaN   
Round 4                                             NaN   

         Low Financial status, Low Loyalty  \
Base                                7543.0   
Round 1                                NaN   
Round 2                                NaN   
Round 3                                N

As we can see from the simulation, when customer satisfaction and campaign effectiveness are high, the number of high value customer and total customer increases. While customer satisfaction and campaign effectiveness are low, the number of high value customer and total customer decreases.

## saving the data

In [15]:
simulation_result.to_csv("../../data/processed/simulation_result.csv", index = True)