# 1. Business Understanding

1. What relevant key metrics are provided to evaluate the CTA combinations? And which CTA Copy and CTA Placement did best/worst based on the key metrics? - The main metric provided to evaluate the CTA combinations is click through rate (CTR). This is because the higher the CTR, the more likely the user will click on the CTA and visit the website, which means that this would allow us to evaluate the CTA combinations. Other key metrics are submittedForm, scheduledAppointment, and revenue as these also allow us to evaluate the CTA combinations in terms of what types of clicks happen.

In [None]:
## Loading Data

In [None]:
import pandas as pd
import numpy as np

train_df = pd.read_csv('train.csv')

## Computing Metrics

In [None]:
metrics = train_df.groupby(['ctaCopy', 'ctaPlacement']).agg({
    'clickedCTA': 'mean',
    'submittedForm': 'mean',
    'scheduledAppointment': 'mean',
    'mortgageVariation': 'mean',
    'revenue': ['sum', 'mean']
}).reset_index()

metrics.columns = ['ctaCopy', 'ctaPlacement', 'CTR', 'submittedForm', 'scheduledAppointment', 'mortgageVariation', 'Total_Revenue', 'Mean_Revenue']

## Displaying Results

In [None]:
print("Metrics for each CTA combination:\n")
print(metrics[['ctaCopy', 'ctaPlacement', 'CTR', 'submittedForm', 'scheduledAppointment', 'mortgageVariation', 'Total_Revenue', 'Mean_Revenue']].to_string(index=False))

## Best Performing Combinations

In [None]:
best_ctr = metrics.loc[metrics['CTR'].idxmax()]
best_submitted = metrics.loc[metrics['submittedForm'].idxmax()]
best_appointment = metrics.loc[metrics['scheduledAppointment'].idxmax()]
best_mortgage = metrics.loc[metrics['mortgageVariation'].idxmax()]
best_revenue = metrics.loc[metrics['Total_Revenue'].idxmax()]

print("="*80)
print("BEST PERFORMING COMBINATIONS:")
print("="*80)

print(f"\nHighest CTR: {best_ctr['ctaCopy']} - {best_ctr['ctaPlacement']}")
print(f"  CTR: {best_ctr['CTR']:.4f}")

print(f"\nHighest submittedForm: {best_submitted['ctaCopy']} - {best_submitted['ctaPlacement']}")
print(f"  submittedForm: {best_submitted['submittedForm']:.4f}")

print(f"\nHighest scheduledAppointment: {best_appointment['ctaCopy']} - {best_appointment['ctaPlacement']}")
print(f"  scheduledAppointment: {best_appointment['scheduledAppointment']:.4f}")

print(f"\nHighest mortgageVariation: {best_mortgage['ctaCopy']} - {best_mortgage['ctaPlacement']}")
print(f"  mortgageVariation: {best_mortgage['mortgageVariation']:.4f}")

print(f"\nHighest Total Revenue: {best_revenue['ctaCopy']} - {best_revenue['ctaPlacement']}")
print(f"  Total Revenue: ${best_revenue['Total_Revenue']:.2f}")

## Worst Performing Combinations

In [None]:
worst_ctr = metrics.loc[metrics['CTR'].idxmin()]
worst_submitted = metrics.loc[metrics['submittedForm'].idxmin()]
worst_appointment = metrics.loc[metrics['scheduledAppointment'].idxmin()]
worst_mortgage = metrics.loc[metrics['mortgageVariation'].idxmin()]
worst_revenue = metrics.loc[metrics['Total_Revenue'].idxmin()]

print("="*80)
print("WORST PERFORMING COMBINATIONS:")
print("="*80)

print(f"\nLowest CTR: {worst_ctr['ctaCopy']} - {worst_ctr['ctaPlacement']}")
print(f"  CTR: {worst_ctr['CTR']:.4f}")

print(f"\nLowest submittedForm: {worst_submitted['ctaCopy']} - {worst_submitted['ctaPlacement']}")
print(f"  submittedForm: {worst_submitted['submittedForm']:.4f}")

print(f"\nLowest scheduledAppointment: {worst_appointment['ctaCopy']} - {worst_appointment['ctaPlacement']}")
print(f"  scheduledAppointment: {worst_appointment['scheduledAppointment']:.4f}")

print(f"\nLowest mortgageVariation: {worst_mortgage['ctaCopy']} - {worst_mortgage['ctaPlacement']}")
print(f"  mortgageVariation: {worst_mortgage['mortgageVariation']:.4f}")

print(f"\nLowest Total Revenue: {worst_revenue['ctaCopy']} - {worst_revenue['ctaPlacement']}")
print(f"  Total Revenue: ${worst_revenue['Total_Revenue']:.2f}")

2. Which groups of people tend to be more correlated or less correlated with our key metrics?

3. What ways can you manipulate the columns/dataset to create features that increase predictive power towards our key metric?

4. Besides Log Loss, what other metrics will you use to evaluate the model's performance, and why?

# 2. Exploratory Data Analysis

# 3. Baseline Model

# 4. Iteration 1: Feature Engineering

# 5. Iteration 2: Model Improvement

# 6. Final Model Selection

# 7. Test Predictions