What strategies can we implement to optimize our marketing campaigns in real-time?
- Create an algorithm for dynamic campaign adjustment based on real-time performance
metrics.
- Simulate the impact of proposed adjustments on campaign effectiveness

CODE WAIT TO BE ADJUSTED BASED ON THE RECOMMENDATION SYSTEM OUTCOME

### RoadMap
1. Data Cleaning
2. Data Merging
3. Online Learning with SGD
4. Simulation with A/B testing

### Part 1 Data Cleaning

#### transaction data

In [113]:
import numpy as np
import pandas as pd
from datetime import datetime
from sklearn.preprocessing import LabelEncoder, MinMaxScaler
from sklearn.model_selection import train_test_split

In [57]:
transaction_data = pd.read_csv("../data/bank_transactions.csv")

In [58]:
transaction_data.describe

<bound method NDFrame.describe of         TransactionID CustomerID CustomerDOB CustGender   CustLocation  \
0                  T1   C5841053     10/1/94          F     JAMSHEDPUR   
1                  T2   C2142763      4/4/57          M        JHAJJAR   
2                  T3   C4417068    26/11/96          F         MUMBAI   
3                  T4   C5342380     14/9/73          F         MUMBAI   
4                  T5   C9031234     24/3/88          F    NAVI MUMBAI   
...               ...        ...         ...        ...            ...   
1048562      T1048563   C8020229      8/4/90          M      NEW DELHI   
1048563      T1048564   C6459278     20/2/92          M         NASHIK   
1048564      T1048565   C6412354     18/5/89          M      HYDERABAD   
1048565      T1048566   C6420483     30/8/78          M  VISAKHAPATNAM   
1048566      T1048567   C8337524      5/3/84          M           PUNE   

         CustAccountBalance TransactionDate  TransactionTime  \
0            

In [59]:
## remove duplicate
transaction_data = transaction_data.drop_duplicates()

In [60]:
transaction_data.isnull().sum()

TransactionID                 0
CustomerID                    0
CustomerDOB                3397
CustGender                 1100
CustLocation                151
CustAccountBalance         2369
TransactionDate               0
TransactionTime               0
TransactionAmount (INR)       0
dtype: int64

In [61]:
## we are dealing with banking campaign instead of fraud detection, so we consider that CustLocation is not a useful feature
transaction_data = transaction_data.drop('CustLocation', axis=1)

In [62]:
## given that this data is of customer transaction, we can see if the missing value is for just some particular customer
columns_to_check = ['CustomerDOB', 'CustGender', 'CustAccountBalance']
customers_with_missing_data = transaction_data[transaction_data[columns_to_check].isna().any(axis=1)]['CustomerID']

In [63]:
unique_customer_ids = transaction_data['CustomerID'].unique()
print(f"Number of unique CustomerIDs: {len(unique_customer_ids)}")

Number of unique CustomerIDs: 884265


In [64]:
transactions_with_missing_data = transaction_data[transaction_data[columns_to_check].isna().any(axis=1)]
all_transactions_for_customers = transaction_data[transaction_data['CustomerID'].isin(customers_with_missing_data)]
other_transactions = all_transactions_for_customers[~all_transactions_for_customers['TransactionID'].isin(transactions_with_missing_data['TransactionID'])]

In [65]:
## we can remove those customers that only have transaction with missing data consider their size is more relative to the whole dataset
customers_with_other_transactions = other_transactions['CustomerID'].unique()
print(f"Number of unique CustomerIDs: {len(customers_with_other_transactions)}")
customers_with_only_missing_transactions = transactions_with_missing_data[
    ~transactions_with_missing_data['CustomerID'].isin(customers_with_other_transactions)]

Number of unique CustomerIDs: 1990


In [66]:
customers_with_only_missing_transactions = customers_with_only_missing_transactions['CustomerID'].unique()
transaction_data = transaction_data[~transaction_data['CustomerID'].isin(customers_with_only_missing_transactions)]

now, we try to impute missing data based on the other transaction of the same customer

In [67]:
unique_customer_ids = transaction_data['CustomerID'].unique()
print(f"Number of unique CustomerIDs: {len(unique_customer_ids)}")

Number of unique CustomerIDs: 879472


In [68]:
## we generally assume that transaction later would have a transaction ID with larger number

transaction_data_without = transaction_data[~transaction_data['CustomerID'].isin(customers_with_other_transactions)].copy()
transaction_data_with = transaction_data[transaction_data['CustomerID'].isin(customers_with_other_transactions)].copy()

transaction_data_with['TransactionID_numeric'] = transaction_data_with['TransactionID'].str.extract(r'(\d+)').astype(int)

grouped = transaction_data_with.groupby('CustomerID', group_keys=False)
transaction_data_with = grouped.apply(lambda x: x.sort_values('TransactionID_numeric'))


  transaction_data_with = grouped.apply(lambda x: x.sort_values('TransactionID_numeric'))


In [69]:
transaction_data_with['CustomerDOB'] = grouped['CustomerDOB'].apply(lambda x: x.ffill().bfill())
transaction_data_with['CustGender'] = grouped['CustGender'].apply(lambda x: x.ffill().bfill())

In [70]:
transaction_data_with = transaction_data_with.reset_index(drop=True)
transaction_data_with.isnull().sum()

TransactionID                0
CustomerID                   0
CustomerDOB                  0
CustGender                   0
CustAccountBalance         702
TransactionDate              0
TransactionTime              0
TransactionAmount (INR)      0
TransactionID_numeric        0
dtype: int64

In [71]:
def impute_balance(group):
    # Ensure transactions are sorted by TransactionID or TransactionID_numeric
    group = group.sort_values('TransactionID_numeric')
    
    # Forward fill and backward fill for known balance values (handles missing values at the start and end)
    group['CustAccountBalance'] = group['CustAccountBalance'].ffill().bfill()
    
    # Loop through transactions and impute missing balances based on previous balance and TransactionAmount
    for i in range(1, len(group)):
        if pd.isna(group.iloc[i]['CustAccountBalance']):
            group.iloc[i, group.columns.get_loc('CustAccountBalance')] = (
                group.iloc[i-1]['CustAccountBalance'] + group.iloc[i]['TransactionAmount (INR)']
            )
    
    return group

grouped = transaction_data_with.groupby('CustomerID', group_keys=False)
transaction_data_with = grouped.apply(impute_balance)

missing_balances_after_imputation = transaction_data_with[transaction_data_with['CustAccountBalance'].isna()]



  transaction_data_with = grouped.apply(impute_balance)


In [72]:
transaction_data_with = transaction_data_with.drop("TransactionID_numeric", axis=1)

In [73]:
transaction_data_cleaned = pd.concat([transaction_data_without,transaction_data_with], axis=0, ignore_index=True)

In [74]:
transaction_data_cleaned.loc[:,'CustGender'] = transaction_data_cleaned.loc[:,'CustGender'].replace({"F":1, "M":0})
transaction_data_cleaned['TransactionDate'] = pd.to_datetime(transaction_data_cleaned['TransactionDate'], errors='coerce', format='%d/%m/%y') 

In [75]:
transaction_data_cleaned['CustomerDOB'] = transaction_data_cleaned['CustomerDOB'].str.strip()

In [76]:
temp = transaction_data_cleaned.copy()
temp['CustomerDOB'] = pd.to_datetime(temp['CustomerDOB'], format='%d/%m/%y', errors='coerce')
nullindex = temp[temp['CustomerDOB'].isna()].index

In [77]:
transaction_data_cleaned['CustomerDOB'][nullindex] 

16         1/1/1800
22         1/1/1800
28         1/1/1800
34         1/1/1800
150        1/1/1800
             ...   
1043696    1/1/1800
1043704    1/1/1800
1043705    1/1/1800
1043725    1/1/1800
1043739    1/1/1800
Name: CustomerDOB, Length: 56633, dtype: object

In [78]:
### it's easy to find that they dont have DOB record for those row, we can only remove them
transaction_data_cleaned['CustomerDOB'] = pd.to_datetime(transaction_data_cleaned['CustomerDOB'], format='%d/%m/%y', errors='coerce')

In [79]:
transaction_data_cleaned['CustomerDOB'] = transaction_data_cleaned['CustomerDOB'].apply(lambda x: x if x.year <= 2024 else x.replace(year=x.year - 100))
transaction_data_cleaned = transaction_data_cleaned.dropna(subset=['CustomerDOB'])

In [80]:
transaction_data_cleaned['TransactionTime'] = transaction_data_cleaned['TransactionTime'].astype(str).str.zfill(6)
transaction_data_cleaned['TransactionTime'] = transaction_data_cleaned['TransactionTime'].apply(lambda x: datetime.strptime(x, '%H%M%S').time())

In [82]:
transaction_data_cleaned.info()

<class 'pandas.core.frame.DataFrame'>
Index: 987133 entries, 0 to 1043765
Data columns (total 8 columns):
 #   Column                   Non-Null Count   Dtype         
---  ------                   --------------   -----         
 0   TransactionID            987133 non-null  object        
 1   CustomerID               987133 non-null  object        
 2   CustomerDOB              987133 non-null  datetime64[ns]
 3   CustGender               987133 non-null  object        
 4   CustAccountBalance       987133 non-null  float64       
 5   TransactionDate          987133 non-null  datetime64[ns]
 6   TransactionTime          987133 non-null  object        
 7   TransactionAmount (INR)  987133 non-null  float64       
dtypes: datetime64[ns](2), float64(2), object(4)
memory usage: 67.8+ MB


In [83]:
def calculate_age(transaction_date, dob):
    if pd.isna(transaction_date) or pd.isna(dob):
        return None  # Return None if either date is missing
    age = transaction_date.year - dob.year - ((transaction_date.month, transaction_date.day) < (dob.month, dob.day))
    return age

# Step 3: Apply the function row-wise to calculate 'CustAge'
transaction_data_cleaned['CustAge'] = transaction_data_cleaned.apply(lambda row: calculate_age(row['TransactionDate'], row['CustomerDOB']), axis=1)


In [84]:
transaction_data = transaction_data_cleaned

In [88]:
campaign = pd.read_csv("../data/campaign_data.csv", delimiter = ";")

In [89]:
campaign  = campaign.drop_duplicates()
campaign.isnull().sum()

age          0
job          0
marital      0
education    0
default      0
balance      0
housing      0
loan         0
contact      0
day          0
month        0
duration     0
campaign     0
pdays        0
previous     0
poutcome     0
y            0
dtype: int64

In [90]:
campaign = campaign.rename({"y": "outcome"},axis=1)

In [91]:
unique_values = {}
for col in campaign.columns:
    unique_values[col] = campaign[col].unique()
for col, values in unique_values.items():
    print(f"Unique values in column {col}:")
    print(values)
    print("\n")

Unique values in column age:
[58 44 33 47 35 28 42 43 41 29 53 57 51 45 60 56 32 25 40 39 52 46 36 49
 59 37 50 54 55 48 24 38 31 30 27 34 23 26 61 22 21 20 66 62 83 75 67 70
 65 68 64 69 72 71 19 76 85 63 90 82 73 74 78 80 94 79 77 86 95 81 18 89
 84 87 92 93 88]


Unique values in column job:
['management' 'technician' 'entrepreneur' 'blue-collar' 'unknown'
 'retired' 'admin.' 'services' 'self-employed' 'unemployed' 'housemaid'
 'student']


Unique values in column marital:
['married' 'single' 'divorced']


Unique values in column education:
['tertiary' 'secondary' 'unknown' 'primary']


Unique values in column default:
['no' 'yes']


Unique values in column balance:
[ 2143    29     2 ...  8205 14204 16353]


Unique values in column housing:
['yes' 'no']


Unique values in column loan:
['no' 'yes']


Unique values in column contact:
['unknown' 'cellular' 'telephone']


Unique values in column day:
[ 5  6  7  8  9 12 13 14 15 16 19 20 21 23 26 27 28 29 30  2  3  4 11 17
 18 24 25  1 

In [92]:
## null values are in the form of "unknown"
## according to the feature description, "other" is considered "nonexistent" - that is: they dont have last campaign
campaign['poutcome'] = campaign['poutcome'].replace({"other":"unexistent"})

In [93]:
campaign['month'] = pd.to_datetime(campaign['month'], format='%b').dt.month

In [94]:
## now we convert unknown to null values
campaign['contact'] = campaign['contact'].replace({"unknown":np.nan})
campaign['education'] = campaign['education'].replace({"unknown":np.nan})
campaign['job'] = campaign['job'].replace({"unknown":np.nan})

In [95]:
campaign.isnull().sum()

age              0
job            288
marital          0
education     1857
default          0
balance          0
housing          0
loan             0
contact      13020
day              0
month            0
duration         0
campaign         0
pdays            0
previous         0
poutcome         0
outcome          0
dtype: int64

In [96]:
## since the missing rows of job and education are of small size compared to the whole dataset, we can simply drop it
## since campaign can only be conducted based on having contact, so we will impute the mode value(given that they all have last contact)
campaign['contact'] = campaign['contact'].fillna(campaign['contact'].mode()[0])

In [97]:
campaign = campaign.dropna(subset=['job'])
campaign = campaign.dropna(subset=['education'])

In [98]:
## convert job to income group based on assumption on jobs
job_to_income_group = {
    'management': '$120K +',
    'entrepreneur': '$120K +',
    'self-employed': '$120K +',
    'technician': '$80K - $120K',
    'admin.': '$80K - $120K',
    'services': '$60K - $80K',
    'blue-collar': '$40K - $60K',
    'housemaid': '$40K - $60K',
    'unemployed': 'Less than $40K',
    'student': 'Less than $40K',
    'retired': 'Less than $40K'
}

In [99]:
campaign.loc[:,"income"] =campaign['job'].map(job_to_income_group)

In [100]:
campaign_data = campaign

In [101]:
transaction_data

Unnamed: 0,TransactionID,CustomerID,CustomerDOB,CustGender,CustAccountBalance,TransactionDate,TransactionTime,TransactionAmount (INR),CustAge
0,T1,C5841053,1994-01-10,1,17819.05,2016-08-02,14:32:07,0.000016,22
1,T2,C2142763,1957-04-04,0,2270.69,2016-08-02,14:18:58,0.017948,59
2,T3,C4417068,1996-11-26,1,17874.44,2016-08-02,14:27:12,0.000294,19
3,T4,C5342380,1973-09-14,1,866503.21,2016-08-02,14:27:14,0.001320,42
4,T5,C9031234,1988-03-24,1,6714.43,2016-08-02,18:11:56,0.001130,28
...,...,...,...,...,...,...,...,...,...
1043761,T1047249,C8510525,1992-04-10,0,98896.96,2016-09-18,00:58:52,0.001923,24
1043762,T1047276,C1640779,1991-11-02,1,4410.16,2016-09-18,07:25:27,0.000044,24
1043763,T1047576,C6119081,1984-02-23,0,75947.08,2016-09-18,17:48:38,0.000160,32
1043764,T1047763,C3827041,1995-05-18,0,91.36,2016-09-18,19:31:22,0.000182,21


## Model building planning 
1. what kind of data we need:
    - real-time performance metrics
    - demographic features
    - financial bahaviors

#### Real time performance metrics:
- `click-through rate` number of successful outcomes per campaign
- `conversion rate` number of desired action (making transaction) per campaign
- `transaction frequency` number of transaction from past month

2. What we need to do:
    We now have two separate data - campaign and transaction data
    We need time-series data that consistently updated the outcome and the number of transaction
    For simplicity, we decide to randomly assign the possible outcome and number of campaign based on the distribution
    - distribution of campaign number
    - for each campaign number, there is corresponding percentage of outcome (failure vs success)

### Data Preprocessing

In [102]:
transaction_data.isnull().sum()

TransactionID              0
CustomerID                 0
CustomerDOB                0
CustGender                 0
CustAccountBalance         0
TransactionDate            0
TransactionTime            0
TransactionAmount (INR)    0
CustAge                    0
dtype: int64

In [103]:
transaction_data['CustAge'] = transaction_data['CustAge'].astype(int)
transaction_data['TransactionDate'] = pd.to_datetime(transaction_data['TransactionDate'])
transaction_data['TransactionAmount (INR)'] = MinMaxScaler().fit_transform(transaction_data[['TransactionAmount (INR)']])

In [104]:
campaign_data['day'] = campaign_data['day'].astype(int)
campaign_data['duration'] = campaign_data['duration'].astype(int)
le_contact = LabelEncoder()
le_outcome = LabelEncoder()
campaign_data['contact_encoded'] = le_contact.fit_transform(campaign_data['contact'])
campaign_data['outcome_encoded'] = le_outcome.fit_transform(campaign_data['outcome'])


In [109]:
context_distribution = campaign_data['campaign'].value_counts(normalize=True)  
outcome_distribution = campaign_data.groupby('campaign')['outcome'].value_counts(normalize=True).unstack(fill_value=0) 

In [111]:
from joblib import Parallel, delayed

def generate_context_outcome(n_samples, context_dist, outcome_dist):
    number_of_campaigns = np.random.choice(context_dist.index, size=n_samples, p=context_dist.values)
    
    outcomes = []
    for context in number_of_campaigns:
        outcome_probs = outcome_dist.loc[context]
        outcome = np.random.choice(outcome_probs.index, p=outcome_probs.values)
        outcomes.append(outcome)
        
    return number_of_campaigns, outcomes

n_samples = len(transaction_data)
batch_size = 10000 
n_batches = n_samples // batch_size + 1

results = Parallel(n_jobs=-1)(delayed(generate_context_outcome)(batch_size, context_distribution, outcome_distribution) for _ in range(n_batches))

number_of_campaigns = np.concatenate([result[0] for result in results])
outcomes = np.concatenate([result[1] for result in results])

transaction_data['number_of_campaigns'] = number_of_campaigns[:n_samples]
transaction_data['outcome'] = outcomes[:n_samples]

### using multi-arm bandit as the algorithm for reinforcement learning
- actions = [change offers, change campaign timing]
- implemented recommendation system for changing offers
- (possible) implemented segmentation update for recommendation system

In [115]:
train_cutoff_date = transaction_data['TransactionDate'].quantile(0.7)
train_data = transaction_data[transaction_data['TransactionDate'] <= train_cutoff_date]
test_data = transaction_data[transaction_data['TransactionDate'] > train_cutoff_date]

In [118]:
def feature_engineering_optimized(data):
    data.loc[:,'TransactionDate'] = pd.to_datetime(data['TransactionDate'])
    data = data.sort_values(['CustomerID', 'TransactionDate'])
    
    data['transaction_diff'] = data.groupby('CustomerID')['TransactionDate'].diff().dt.days.fillna(0)
    data['transaction_frequency'] = data['transaction_diff'].rolling(window=30, min_periods=1).apply(lambda x: (x <= 30).sum())

    successful_outcomes = data[data['outcome'] == 1].groupby('CustomerID')['outcome'].transform('count')
    data['successful_outcomes'] = data['CustomerID'].map(successful_outcomes).fillna(0)
    data['click_through_rate'] = data['successful_outcomes'] / data['number_of_campaigns']

    transaction_counts = data.groupby('CustomerID')['TransactionID'].transform('count')
    data['conversion_rate'] = transaction_counts / data['number_of_campaigns']

    w1, w2, w3 = 0.3, 0.5, 0.2 ## wait to be adjusted
    data['engagement_score'] = (w1 * data['conversion_rate'] +
                                w2 * data['click_through_rate'] +
                                w3 * data['transaction_frequency'])

    data = data.drop(columns=['transaction_diff'])

    return data

train_data = feature_engineering_optimized(train_data)
test_data = feature_engineering_optimized(test_data)


       CustomerID  transaction_frequency  click_through_rate  conversion_rate  \
171876   C1010011                    1.0                 0.0         1.000000   
359631   C1010012                    2.0                 0.0         0.333333   
88709    C1010014                    3.0                 0.0         1.000000   
249304   C1010014                    4.0                 0.0         2.000000   
397693   C1010024                    5.0                 0.0         1.000000   

        engagement_score  
171876               0.5  
359631               0.5  
88709                0.9  
249304               1.4  
397693               1.3  
       CustomerID  transaction_frequency  click_through_rate  conversion_rate  \
33406    C1010011                    1.0                 0.0              0.5   
963368   C1010018                    2.0                 0.0              1.0   
889959   C1010038                    3.0                 0.0              0.5   
880234   C1010041          

In [119]:
features_to_scale = ['conversion_rate', 'click_through_rate', 'transaction_frequency', 'engagement_score']
scaler = MinMaxScaler()

train_data[features_to_scale] = scaler.fit_transform(train_data[features_to_scale])
test_data[features_to_scale] = scaler.transform(test_data[features_to_scale])

In [120]:
action_space = ['no_offer', 'standard_offer', 'premium_offer']
num_actions = len(action_space)
preferences = np.zeros(num_actions)  # initialize all actions with equal perference zero
probabilities = np.exp(preferences) / np.sum(np.exp(preferences))  # use gradient bandit
alpha = 0.1  # learning rate(step size)
baseline = 0  

def compute_reward(engagement_score):
    return engagement_score

# train the model
for index, row in train_data.iterrows():
    action = np.random.choice(action_space, p=probabilities) # choose an action
    action_index = action_space.index(action)
    
    # find reward and update baseline
    reward = compute_reward(row['engagement_score'])
    baseline = baseline + 0.1 * (reward - baseline) 

    # update preference
    for i in range(num_actions):
        if i == action_index:
            preferences[i] += alpha * (reward - baseline) * (1 - probabilities[i])
        else:
            preferences[i] -= alpha * (reward - baseline) * probabilities[i]
    
    # update action probabilities
    probabilities = np.exp(preferences) / np.sum(np.exp(preferences))


We consider that higher engagement score result in higher probability of accepting offer, so the system is built to maximize engagement score. Yet, here we name it as click_through rate to avoid confusion with segmentation rule as in subgroupA

In [121]:
# train on test set
def evaluate_click_through_rate(test_data, action_space, probabilities):
    click_through_results = {action: [] for action in action_space}
    
    for index, row in test_data.iterrows():
        action = np.random.choice(action_space, p=probabilities)
        reward = compute_reward(row['engagement_score'])
        
        if action == 'no_offer':
            click_through_results['no_offer'].append(reward)
        elif action == 'standard_offer':
            click_through_results['standard_offer'].append(reward)
        elif action == 'premium_offer':
            click_through_results['premium_offer'].append(reward)
    
    for action, rewards in click_through_results.items():
        avg_click_through_rate = np.mean(rewards) if rewards else 0
        print(f"Action: {action}, Average Click-Through Rate on Test Data: {avg_click_through_rate:.4f}")

evaluate_click_through_rate(test_data, action_space, probabilities)

Action: no_offer, Average Click-Through Rate on Test Data: 0.7798
Action: standard_offer, Average Click-Through Rate on Test Data: 0.7798
Action: premium_offer, Average Click-Through Rate on Test Data: 0.7797


### To build an optimizing algorithm

1. Objective and Metrics:
The primary objective was to dynamically optimize campaign actions to maximize customer engagement. Although the metric displayed is labeled as `click-through rate` (CTR) for clarity and consistency across segments, it actually represents the average `engagement score`—a composite score that includes `click-through rate`, `conversion rate`, and `transaction frequency`. This allows for a holistic view of campaign effectiveness.

2. Adjustment Process:
The Gradient Bandit model was applied to adaptively select from three actions: `no_offer`, `standard_offer`, and `premium_offer`. Actions were chosen based on the model’s ongoing evaluation of engagement scores, which provide an aggregated measure of customer responsiveness to different campaign actions.

3. Test Data Results and Interpretation:

The average engagement score for each action (labeled as "CTR" for ease of comparison) was nearly identical, with `no_offer` and `standard_offer` achieving an average score of 0.7798, and `premium_offer` slightly lower at 0.7797.

Interpretation: This uniformity across actions suggests that, for the given test data, all actions performed similarly in terms of driving engagement. The small differences imply that no single action was significantly more effective than the others in enhancing customer engagement.

4. Implications for Campaign Adjustment:

These results suggest that further customization of actions, potentially based on refined customer segmentation, could reveal more meaningful differences in engagement.
While each action's impact on engagement was similar in this test, the model's dynamic adjustment capabilities are adaptable and could be highly effective in other scenarios where the variation in engagement across actions is more pronounced.

#### Given that I dont have exact custome recommendation system, below is a newly built model that wait for the adjustment based on recomendation system, besides, possible segmentation system probably required for adding segmentation system, need someone to deal with the generation. I believe this model would be better compared to the above one that is simplified.


In [None]:
run_function = False # you will not be able to run it until I fully adjusted it
historical_data = pd.DataFrame()  # suppose it is updated by the following transaction

# simulate real-time data entries and feature engineering for it
def process_transaction_data(new_transaction):
    global run_function
    
    if not run_function:
        print("The function is currently restrained and will not run.")
        return 
    global historical_data

    new_transaction['TransactionDate'] = pd.to_datetime(new_transaction['TransactionDate'])
    # add to the history
    historical_data = pd.concat([historical_data, new_transaction], ignore_index=True)
    
    # sorted by date
    historical_data = historical_data.sort_values(['CustomerID', 'TransactionDate']).reset_index(drop=True)
    
    historical_data['transaction_diff'] = historical_data.groupby('CustomerID')['TransactionDate'].diff().dt.days.fillna(0)
    historical_data['transaction_frequency'] = historical_data.groupby('CustomerID')['transaction_diff'].transform(lambda x: (x <= 30).sum())
    
    successful_outcomes = historical_data[historical_data['outcome'] == 1].groupby('CustomerID')['outcome'].transform('count')
    historical_data['successful_outcomes'] = historical_data['CustomerID'].map(successful_outcomes).fillna(0)
    historical_data['click_through_rate'] = historical_data['successful_outcomes'] / historical_data['number_of_campaigns']
    transaction_counts = historical_data.groupby('CustomerID')['TransactionID'].transform('count')
    historical_data['conversion_rate'] = transaction_counts / historical_data['number_of_campaigns']
    
    w1, w2, w3 = 0.3, 0.5, 0.2
    historical_data['engagement_score'] = (w1 * historical_data['conversion_rate'] +
                                           w2 * historical_data['click_through_rate'] +
                                           w3 * historical_data['transaction_frequency'])
    
    historical_data = historical_data.drop(columns=['transaction_diff'])
    

    return historical_data.iloc[-1]

# need to be adjusted based on segmentation system application
# too long, I really cannot handle it
def update_segmentation(row):
    if row['CustAge'] > 40 and row['CustAccountBalance'] > 100000:
        return 'high_value_segment'
    elif row['CustAge'] <= 40 and row['CustAccountBalance'] <= 100000:
        return 'low_value_segment'
    else:
        return 'mid_value_segment'

# need to be adjusted based on recommended system
def recommend_action_space(segmentation_label):
    if segmentation_label == 'high_value_segment':
        return ['premium_offer', 'special_discount']
    elif segmentation_label == 'low_value_segment':
        return ['standard_offer', 'basic_discount']
    else:
        return ['standard_offer', 'no_offer']
    
def multi_armed_bandit(action_space, engagement_score):
    num_actions = len(action_space)
    preferences = np.zeros(num_actions)
    probabilities = np.exp(preferences) / np.sum(np.exp(preferences))
    alpha = 0.1  
    baseline = 0 

    action = np.random.choice(action_space, p=probabilities)
    action_index = action_space.index(action)
    
    reward = engagement_score
    baseline = baseline + 0.1 * (reward - baseline)
    
    for i in range(num_actions):
        if i == action_index:
            preferences[i] += alpha * (reward - baseline) * (1 - probabilities[i])
        else:
            preferences[i] -= alpha * (reward - baseline) * probabilities[i]
    
    probabilities = np.exp(preferences) / np.sum(np.exp(preferences))
    
    return action, probabilities


def process_and_recommend(new_transaction):
    processed_data = process_transaction_data(new_transaction)
    processed_data['segmentation_label'] = update_segmentation(processed_data)
    action_space = recommend_action_space(processed_data['segmentation_label'])
    
    selected_action, action_probabilities = multi_armed_bandit(action_space, processed_data['engagement_score'])
    
    print(f"Selected Action: {selected_action}, Probabilities: {action_probabilities}")

# sample input new data
new_transaction = pd.DataFrame({
    'TransactionID': ['T1048568'],
    'CustomerID': ['C1234567'],
    'TransactionDate': ['2023-10-20'],
    'TransactionAmount (INR)': [5000],
    'outcome': [1],
    'number_of_campaigns': [3],
    'CustAge': [45],
    'CustAccountBalance': [150000]
})

process_and_recommend(new_transaction)


possible consideration:
- maybe we can use change contact-timing(day of the week?) as an action, but I'm not sure if this would be valuable
- the structure is easy, given that we already have day and month of last campaign, we can assign distribution of campaign succuess and failure based on the day of the week of the last campaign
- this follows similar structure with what I have done for adding campaign number to the transaction data, and will be easily implemented if you desired, no worry.