# Naive Outcome Modeling and Profit Baseline

This notebook establishes a benchmark targeting strategy using a standard predictive model.

The objective is to demonstrate why predicting purchase probability is insufficient for causal decision-making.

We compare:

1. Random targeting
2. Naive predictive targeting
3. (Later) Uplift-based targeting

This baseline will quantify how much profit is lost when incremental impact is ignored.

In [None]:
import sys
import os
sys.path.append(os.path.abspath(".."))

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score

In [None]:
df = pd.read_csv("../data/simulated_campaign_data.csv")

## Standard Outcome Prediction Model

We train a standard supervised model to estimate:

\[
P(Y=1 \mid X)
\]

This model ignores treatment assignment and counterfactual structure.

It predicts the probability of conversion based purely on observed features.

While useful for classification tasks, this objective is misaligned with causal targeting.

In [None]:
X = df[["age", "income", "tenure", "usage"]]
y = df["outcome"]

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

model = RandomForestClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)

preds = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, preds)

print("Naive Outcome Model AUC:", auc)

### Interpretation of AUC

AUC measures how well the model ranks converters vs non-converters.

High AUC does not imply correct targeting for marketing intervention.

The model may assign high scores to customers who would convert regardless of treatment.

Thus, predictive performance ≠ incremental value.

In [None]:
df["naive_score"] = model.predict_proba(X)[:, 1]
df_sorted = df.sort_values("naive_score", ascending=False)

## Simulated Campaign Economics

We define the following business parameters:

- Cost per targeted customer: 10
- Margin per conversion: 60
- Target budget: top 30% of customers

Profit is defined as:

\[
Profit = (Conversions \times Margin) - (Targets \times Cost)
\]

This creates a realistic decision-making scenario.

In [None]:
COST = 10
MARGIN = 60
TARGET_RATIO = 0.3

top_k = int(len(df) * TARGET_RATIO)
targeted = df_sorted.head(top_k)

revenue = targeted["outcome"].sum() * MARGIN
cost = len(targeted) * COST

profit = revenue - cost

print("Naive Strategy Profit:", profit)

### Naive Strategy Interpretation

Customers are targeted based on highest predicted outcome probability.

This assumes that observed conversions are caused by treatment.

This assumption is flawed under confounding.

The model may target:

- Sure buyers (wasted spend)
- Customers unaffected by treatment

Incremental effect is not isolated.

## Random Targeting Baseline

Random targeting provides a neutral benchmark.

A model must outperform random allocation under identical budget constraints to justify complexity.

In [None]:
df_random = df.sample(frac=TARGET_RATIO, random_state=42)

revenue_random = df_random["outcome"].sum() * MARGIN
cost_random = len(df_random) * COST

profit_random = revenue_random - cost_random

print("Random Targeting Profit:", profit_random)

## Limitation of Predictive Targeting

Predictive modeling optimizes:

P(Y=1 | X)

Causal decision-making requires optimizing:

E[Y(1) − Y(0) | X]

This distinction separates correlation from incremental impact.

Next step: implement uplift modeling to estimate heterogeneous treatment effects.