# 1. Business Understanding

1. What relevant key metrics are provided to evaluate the CTA combinations? And which CTA Copy and CTA Placement did best/worst based on the key metrics? - The main metric provided to evaluate the CTA combinations is click through rate (CTR). This is because the higher the CTR, the more likely the user will click on the CTA and visit the website, which means that this would allow us to evaluate the CTA combinations. Other key metrics are submittedForm, scheduledAppointment, and revenue as these also allow us to evaluate the CTA combinations in terms of what types of clicks happen.

In [33]:
## Loading Data

In [39]:
import pandas as pd
import numpy as np

train_df = pd.read_csv('train.csv')

## Computing Metrics

Converting columns to numeric to handle any type issues:

In [40]:
for col in ['clickedCTA', 'submittedForm', 'scheduledAppointment', 'mortgageVariation', 'revenue']:
    train_df[col] = pd.to_numeric(train_df[col], errors='coerce')

metrics = train_df.groupby(['ctaCopy', 'ctaPlacement']).agg({
    'clickedCTA': 'mean',
    'submittedForm': 'mean',
    'scheduledAppointment': 'mean',
    'mortgageVariation': 'mean',
    'revenue': 'mean'
}).reset_index()

## Displaying Results

In [41]:
print("Metrics for each CTA combination:\n")
print(metrics[['ctaCopy', 'ctaPlacement', 'clickedCTA', 'submittedForm', 'scheduledAppointment', 'mortgageVariation', 'revenue']].to_string(index=False))

Metrics for each CTA combination:

                                                      ctaCopy ctaPlacement  clickedCTA  submittedForm  scheduledAppointment  mortgageVariation    revenue
                  Access Your Personalized Mortgage Rates Now       Bottom    0.134821       0.117001              0.051751                NaN 218.982609
                  Access Your Personalized Mortgage Rates Now       Middle    0.161462       0.126901              0.050671                NaN 225.461812
                  Access Your Personalized Mortgage Rates Now          Top    0.186482       0.150752              0.054631                NaN 221.869852
First Time? We've Made it Easy to Find the Best Mortgage Rate       Bottom    0.153092       0.135631              0.056881                NaN 226.882911
First Time? We've Made it Easy to Find the Best Mortgage Rate       Middle    0.169922       0.135811              0.053191                NaN 226.945854
First Time? We've Made it Easy to Find th

## Best Performing Combinations

In [None]:
best_clicked_idx = metrics['clickedCTA'].idxmax()
best_submitted_idx = metrics['submittedForm'].idxmax()
best_appointment_idx = metrics['scheduledAppointment'].idxmax()
best_mortgage_idx = metrics['mortgageVariation'].idxmax()
best_revenue_idx = metrics['revenue'].idxmax()

best_clicked = metrics.loc[best_clicked_idx] if pd.notna(best_clicked_idx) else None
best_submitted = metrics.loc[best_submitted_idx] if pd.notna(best_submitted_idx) else None
best_appointment = metrics.loc[best_appointment_idx] if pd.notna(best_appointment_idx) else None
best_mortgage = metrics.loc[best_mortgage_idx] if pd.notna(best_mortgage_idx) else None
best_revenue = metrics.loc[best_revenue_idx] if pd.notna(best_revenue_idx) else None

print("="*80)
print("BEST PERFORMING COMBINATIONS:")
print("="*80)

if best_clicked is not None:
    print(f"\nHighest clickedCTA: {best_clicked['ctaCopy']} - {best_clicked['ctaPlacement']}")
    print(f"  clickedCTA: {best_clicked['clickedCTA']:.4f}")

if best_submitted is not None:
    print(f"\nHighest submittedForm: {best_submitted['ctaCopy']} - {best_submitted['ctaPlacement']}")
    print(f"  submittedForm: {best_submitted['submittedForm']:.4f}")

if best_appointment is not None:
    print(f"\nHighest scheduledAppointment: {best_appointment['ctaCopy']} - {best_appointment['ctaPlacement']}")
    print(f"  scheduledAppointment: {best_appointment['scheduledAppointment']:.4f}")

if best_mortgage is not None:
    print(f"\nHighest mortgageVariation: {best_mortgage['ctaCopy']} - {best_mortgage['ctaPlacement']}")
    print(f"  mortgageVariation: {best_mortgage['mortgageVariation']:.4f}")

if best_revenue is not None:
    print(f"\nHighest Revenue: {best_revenue['ctaCopy']} - {best_revenue['ctaPlacement']}")
    print(f"  Revenue: ${best_revenue['revenue']:.2f}")

  best_mortgage_idx = metrics['mortgageVariation'].idxmax()


KeyError: nan

## Worst Performing Combinations

In [None]:
worst_clicked_idx = metrics['clickedCTA'].idxmin()
worst_submitted_idx = metrics['submittedForm'].idxmin()
worst_appointment_idx = metrics['scheduledAppointment'].idxmin()
worst_mortgage_idx = metrics['mortgageVariation'].idxmin()
worst_revenue_idx = metrics['revenue'].idxmin()

worst_clicked = metrics.loc[worst_clicked_idx] if pd.notna(worst_clicked_idx) else None
worst_submitted = metrics.loc[worst_submitted_idx] if pd.notna(worst_submitted_idx) else None
worst_appointment = metrics.loc[worst_appointment_idx] if pd.notna(worst_appointment_idx) else None
worst_mortgage = metrics.loc[worst_mortgage_idx] if pd.notna(worst_mortgage_idx) else None
worst_revenue = metrics.loc[worst_revenue_idx] if pd.notna(worst_revenue_idx) else None

print("="*80)
print("WORST PERFORMING COMBINATIONS:")
print("="*80)

if worst_clicked is not None:
    print(f"\nLowest clickedCTA: {worst_clicked['ctaCopy']} - {worst_clicked['ctaPlacement']}")
    print(f"  clickedCTA: {worst_clicked['clickedCTA']:.4f}")

if worst_submitted is not None:
    print(f"\nLowest submittedForm: {worst_submitted['ctaCopy']} - {worst_submitted['ctaPlacement']}")
    print(f"  submittedForm: {worst_submitted['submittedForm']:.4f}")

if worst_appointment is not None:
    print(f"\nLowest scheduledAppointment: {worst_appointment['ctaCopy']} - {worst_appointment['ctaPlacement']}")
    print(f"  scheduledAppointment: {worst_appointment['scheduledAppointment']:.4f}")

if worst_mortgage is not None:
    print(f"\nLowest mortgageVariation: {worst_mortgage['ctaCopy']} - {worst_mortgage['ctaPlacement']}")
    print(f"  mortgageVariation: {worst_mortgage['mortgageVariation']:.4f}")

if worst_revenue is not None:
    print(f"\nLowest Revenue: {worst_revenue['ctaCopy']} - {worst_revenue['ctaPlacement']}")
    print(f"  Revenue: ${worst_revenue['revenue']:.2f}")

  worst_mortgage_idx = metrics['mortgageVariation'].idxmin()


KeyError: nan

2. Which groups of people tend to be more correlated or less correlated with our key metrics?

3. What ways can you manipulate the columns/dataset to create features that increase predictive power towards our key metric?

4. Besides Log Loss, what other metrics will you use to evaluate the model's performance, and why?

# 2. Exploratory Data Analysis

# 3. Baseline Model

# 4. Iteration 1: Feature Engineering

# 5. Iteration 2: Model Improvement

# 6. Final Model Selection

# 7. Test Predictions