# 1. Business Understanding

1. What relevant key metrics are provided to evaluate the CTA combinations? And which CTA Copy and CTA Placement did best/worst based on the key metrics? - The main metric provided to evaluate the CTA combinations is click through rate (CTR). This is because the higher the CTR, the more likely the user will click on the CTA and visit the website, which means that this would allow us to evaluate the CTA combinations. Other key metrics are submittedForm, scheduledAppointment, and revenue as these also allow us to evaluate the CTA combinations in terms of what types of clicks happen.

In [None]:
import pandas as pd
import numpy as np

# Load the data
train_df = pd.read_csv('train.csv')

# For each CTA combination, compute the metrics
metrics = train_df.groupby(['ctaCopy', 'ctaPlacement']).agg({
    'clickedCTA': 'mean',  # CTR
    'submittedForm': 'mean',  # submittedForm rate
    'scheduledAppointment': 'mean',  # scheduledAppointment rate
    'mortgageVariation': 'mean',  # mortgageVariation rate
    'revenue': ['sum', 'mean']  # Total and mean revenue
}).reset_index()

# Flatten column names
metrics.columns = ['ctaCopy', 'ctaPlacement', 'CTR', 'submittedForm', 'scheduledAppointment', 'mortgageVariation', 'Total_Revenue', 'Mean_Revenue']

# Display the metrics
print("Metrics for each CTA combination:\n")
print(metrics[['ctaCopy', 'ctaPlacement', 'CTR', 'submittedForm', 'scheduledAppointment', 'mortgageVariation', 'Total_Revenue', 'Mean_Revenue']].to_string(index=False))

# Identify best and worst performing combinations
print("\n" + "="*80)
print("BEST PERFORMING COMBINATIONS:")
print("="*80)

best_ctr = metrics.loc[metrics['CTR'].idxmax()]
best_submitted = metrics.loc[metrics['submittedForm'].idxmax()]
best_appointment = metrics.loc[metrics['scheduledAppointment'].idxmax()]
best_mortgage = metrics.loc[metrics['mortgageVariation'].idxmax()]
best_revenue = metrics.loc[metrics['Total_Revenue'].idxmax()]

print(f"\nHighest CTR: {best_ctr['ctaCopy']} - {best_ctr['ctaPlacement']}")
print(f"  CTR: {best_ctr['CTR']:.4f}")

print(f"\nHighest submittedForm: {best_submitted['ctaCopy']} - {best_submitted['ctaPlacement']}")
print(f"  submittedForm: {best_submitted['submittedForm']:.4f}")

print(f"\nHighest scheduledAppointment: {best_appointment['ctaCopy']} - {best_appointment['ctaPlacement']}")
print(f"  scheduledAppointment: {best_appointment['scheduledAppointment']:.4f}")

print(f"\nHighest mortgageVariation: {best_mortgage['ctaCopy']} - {best_mortgage['ctaPlacement']}")
print(f"  mortgageVariation: {best_mortgage['mortgageVariation']:.4f}")

print(f"\nHighest Total Revenue: {best_revenue['ctaCopy']} - {best_revenue['ctaPlacement']}")
print(f"  Total Revenue: ${best_revenue['Total_Revenue']:.2f}")

print("\n" + "="*80)
print("WORST PERFORMING COMBINATIONS:")
print("="*80)

worst_ctr = metrics.loc[metrics['CTR'].idxmin()]
worst_submitted = metrics.loc[metrics['submittedForm'].idxmin()]
worst_appointment = metrics.loc[metrics['scheduledAppointment'].idxmin()]
worst_mortgage = metrics.loc[metrics['mortgageVariation'].idxmin()]
worst_revenue = metrics.loc[metrics['Total_Revenue'].idxmin()]

print(f"\nLowest CTR: {worst_ctr['ctaCopy']} - {worst_ctr['ctaPlacement']}")
print(f"  CTR: {worst_ctr['CTR']:.4f}")

print(f"\nLowest submittedForm: {worst_submitted['ctaCopy']} - {worst_submitted['ctaPlacement']}")
print(f"  submittedForm: {worst_submitted['submittedForm']:.4f}")

print(f"\nLowest scheduledAppointment: {worst_appointment['ctaCopy']} - {worst_appointment['ctaPlacement']}")
print(f"  scheduledAppointment: {worst_appointment['scheduledAppointment']:.4f}")

print(f"\nLowest mortgageVariation: {worst_mortgage['ctaCopy']} - {worst_mortgage['ctaPlacement']}")
print(f"  mortgageVariation: {worst_mortgage['mortgageVariation']:.4f}")

print(f"\nLowest Total Revenue: {worst_revenue['ctaCopy']} - {worst_revenue['ctaPlacement']}")
print(f"  Total Revenue: ${worst_revenue['Total_Revenue']:.2f}")

In [6]:
## Loading Data

In [7]:
import pandas as pd
import numpy as np

train_df = pd.read_csv('train.csv')

## Computing Metrics

For each CTA combination, we'll compute:
- **CTR** = mean(clickedCTA)
- **Appointment rate** = mean(submittedForm)
- **Revenue per impression** = total revenue / number of impressions
- **Revenue per click** = total revenue / number of clicks

In [8]:
metrics = train_df.groupby(['ctaCopy', 'ctaPlacement']).agg({
    'clickedCTA': 'mean',  # CTR
    'submittedForm': 'mean',  # Appointment rate
    'revenue': ['sum', 'mean']  # Total revenue and mean revenue
}).reset_index()

# Rename columns
metrics.columns = ['ctaCopy', 'ctaPlacement', 'CTR', 'Appointment_Rate', 'Total_Revenue', 'Mean_Revenue']

In [9]:
# Calculate Revenue per Impression (total revenue / number of impressions)
impressions = train_df.groupby(['ctaCopy', 'ctaPlacement']).size().reset_index(name='Impressions')
metrics = metrics.merge(impressions, on=['ctaCopy', 'ctaPlacement'])
metrics['Revenue_per_Impression'] = metrics['Total_Revenue'] / metrics['Impressions']

In [10]:
# Calculate Revenue per Click (total revenue / number of clicks)
clicks = train_df[train_df['clickedCTA'] == 1].groupby(['ctaCopy', 'ctaPlacement']).size().reset_index(name='Clicks')
metrics = metrics.merge(clicks, on=['ctaCopy', 'ctaPlacement'], how='left')
metrics['Clicks'] = metrics['Clicks'].fillna(0)
metrics['Revenue_per_Click'] = metrics['Total_Revenue'] / metrics['Clicks'].replace(0, np.nan)

## Displaying Results

In [11]:
print(metrics[['ctaCopy', 'ctaPlacement', 'CTR', 'Appointment_Rate', 
               'Revenue_per_Impression', 'Revenue_per_Click']].to_string(index=False))

                                                      ctaCopy ctaPlacement      CTR  Appointment_Rate  Revenue_per_Impression  Revenue_per_Click
                  Access Your Personalized Mortgage Rates Now       Bottom 0.134821          0.117001               11.332463          84.055407
                  Access Your Personalized Mortgage Rates Now       Middle 0.161462          0.126901               11.424264          70.755295
                  Access Your Personalized Mortgage Rates Now          Top 0.186482          0.150752               12.120871          64.997587
First Time? We've Made it Easy to Find the Best Mortgage Rate       Bottom 0.153092          0.135631               12.905229          84.297472
First Time? We've Made it Easy to Find the Best Mortgage Rate       Middle 0.169922          0.135811               12.071371          71.040784
First Time? We've Made it Easy to Find the Best Mortgage Rate          Top 0.198452          0.159032               12.286923     

## Best Performing Combinations

In [12]:
est_ctr = metrics.loc[metrics['CTR'].idxmax()]
best_appt = metrics.loc[metrics['Appointment_Rate'].idxmax()]
best_rev_imp = metrics.loc[metrics['Revenue_per_Impression'].idxmax()]
best_rev_click = metrics.loc[metrics['Revenue_per_Click'].idxmax()]

print(f"Highest CTR: {best_ctr['ctaCopy']} - {best_ctr['ctaPlacement']} (CTR: {best_ctr['CTR']:.4f})")
print(f"Highest Appointment Rate: {best_appt['ctaCopy']} - {best_appt['ctaPlacement']} (Rate: {best_appt['Appointment_Rate']:.4f})")
print(f"Highest Revenue per Impression: {best_rev_imp['ctaCopy']} - {best_rev_imp['ctaPlacement']} (Revenue: ${best_rev_imp['Revenue_per_Impression']:.2f})")
print(f"Highest Revenue per Click: {best_rev_click['ctaCopy']} - {best_rev_click['ctaPlacement']} (Revenue: ${best_rev_click['Revenue_per_Click']:.2f})")

Highest CTR: Get Pre-Approved for a Mortgage in 5 Minutes - Top (CTR: 0.2118)
Highest Appointment Rate: Get Pre-Approved for a Mortgage in 5 Minutes - Top (Rate: 0.1909)
Highest Revenue per Impression: First Time? We've Made it Easy to Find the Best Mortgage Rate - Bottom (Revenue: $12.91)
Highest Revenue per Click: First Time? We've Made it Easy to Find the Best Mortgage Rate - Bottom (Revenue: $84.30)


## Worst Performing Combinations

In [13]:
worst_ctr = metrics.loc[metrics['CTR'].idxmin()]
worst_appt = metrics.loc[metrics['Appointment_Rate'].idxmin()]
worst_rev_imp = metrics.loc[metrics['Revenue_per_Impression'].idxmin()]
worst_rev_click = metrics.loc[metrics['Revenue_per_Click'].idxmin()]

print(f"Lowest CTR: {worst_ctr['ctaCopy']} - {worst_ctr['ctaPlacement']} (CTR: {worst_ctr['CTR']:.4f})")
print(f"Lowest Appointment Rate: {worst_appt['ctaCopy']} - {worst_appt['ctaPlacement']} (Rate: {worst_appt['Appointment_Rate']:.4f})")
print(f"Lowest Revenue per Impression: {worst_rev_imp['ctaCopy']} - {worst_rev_imp['ctaPlacement']} (Revenue: ${worst_rev_imp['Revenue_per_Impression']:.2f})")
print(f"Lowest Revenue per Click: {worst_rev_click['ctaCopy']} - {worst_rev_click['ctaPlacement']} (Revenue: ${worst_rev_click['Revenue_per_Click']:.2f})")

Lowest CTR: Access Your Personalized Mortgage Rates Now - Bottom (CTR: 0.1348)
Lowest Appointment Rate: Access Your Personalized Mortgage Rates Now - Bottom (Rate: 0.1170)
Lowest Revenue per Impression: Access Your Personalized Mortgage Rates Now - Bottom (Revenue: $11.33)
Lowest Revenue per Click: Get Pre-Approved for a Mortgage in 5 Minutes - Top (Revenue: $59.89)


2. Which groups of people tend to be more correlated or less correlated with our key metrics?

3. What ways can you manipulate the columns/dataset to create features that increase predictive power towards our key metric?

4. Besides Log Loss, what other metrics will you use to evaluate the model's performance, and why?

# 2. Exploratory Data Analysis

# 3. Baseline Model

# 4. Iteration 1: Feature Engineering

# 5. Iteration 2: Model Improvement

# 6. Final Model Selection

# 7. Test Predictions