# A/B Test Notebook — Prime Overnight Delivery Emphasis (Simulated)

This notebook simulates and analyzes an A/B test inspired by Amazon PDP delivery messaging:
- **Control (A):** standard layout showing “Free delivery over $35” and “Prime Overnight 4–8am” with similar prominence
- **Treatment (B):** Prime Overnight emphasized near the Buy Box; standard free delivery de-emphasized

**Goal:** show clean A/B test methodology (randomization, metrics, confidence intervals, guardrails, segmentation).  
**Note:** Outcomes are simulated for portfolio purposes.


In [None]:
import numpy as np
import pandas as pd
import math
from dataclasses import dataclass

## 1) Experiment configuration

We simulate:
- user segment: Prime vs non‑Prime
- behavior funnel: PDP view → Add to Cart → Checkout start
- guardrails: bounce rate, time on page

You can adjust baselines, treatment effects, and sample size to see how conclusions change.


In [None]:
@dataclass
class Config:
    n_users: int = 40000
    prime_share: float = 0.42  # % of users who are Prime (assumption)

    # Baselines (Control)
    atc_prime: float = 0.125
    atc_nonprime: float = 0.085

    checkout_given_atc_prime: float = 0.55
    checkout_given_atc_nonprime: float = 0.50

    bounce_prime: float = 0.36
    bounce_nonprime: float = 0.44

    time_on_pdp_mean_prime: float = 48  # seconds
    time_on_pdp_mean_nonprime: float = 42

    # Treatment effects (multipliers or deltas)
    # Example: treatment increases Prime ATC by +5% relative, and slightly hurts non‑Prime ATC by -2% relative
    atc_prime_mult_B: float = 1.05
    atc_nonprime_mult_B: float = 0.98

    bounce_prime_delta_B: float = 0.00
    bounce_nonprime_delta_B: float = 0.01  # +1pp bounce risk for non‑Prime

    time_mean_delta_B: float = 0.0  # change in seconds

cfg = Config()
cfg

## 2) Simulate data

We assign users 50/50 to A/B (sticky), then generate outcomes from probability models.


In [None]:
rng = np.random.default_rng(7)

n = cfg.n_users
variant = rng.choice(["A","B"], size=n, replace=True)  # 50/50 by default
is_prime = rng.random(n) < cfg.prime_share

# Baseline probabilities per segment
p_atc_A = np.where(is_prime, cfg.atc_prime, cfg.atc_nonprime)

# Apply treatment effects
p_atc_B = np.where(
    is_prime,
    np.clip(cfg.atc_prime * cfg.atc_prime_mult_B, 0, 1),
    np.clip(cfg.atc_nonprime * cfg.atc_nonprime_mult_B, 0, 1)
)

p_atc = np.where(variant=="A", p_atc_A, p_atc_B)

# Guardrails: bounce
p_bounce_A = np.where(is_prime, cfg.bounce_prime, cfg.bounce_nonprime)
p_bounce_B = np.where(
    is_prime,
    np.clip(cfg.bounce_prime + cfg.bounce_prime_delta_B, 0, 1),
    np.clip(cfg.bounce_nonprime + cfg.bounce_nonprime_delta_B, 0, 1)
)
p_bounce = np.where(variant=="A", p_bounce_A, p_bounce_B)

# Outcomes
bounced = rng.random(n) < p_bounce

# If bounced, ATC is almost surely 0. We'll set ATC only for non-bounced users.
atc = (rng.random(n) < p_atc) & (~bounced)

# Checkout starts only if ATC happened
p_checkout = np.where(is_prime, cfg.checkout_given_atc_prime, cfg.checkout_given_atc_nonprime)
checkout_start = (rng.random(n) < p_checkout) & atc

# Time on page: lognormal-ish positive
base_mean = np.where(is_prime, cfg.time_on_pdp_mean_prime, cfg.time_on_pdp_mean_nonprime) + cfg.time_mean_delta_B*(variant=="B")
# lognormal parameters
sigma = 0.5
mu = np.log(np.maximum(base_mean, 1)) - 0.5*sigma*sigma
time_on_pdp = rng.lognormal(mean=mu, sigma=sigma)

df = pd.DataFrame({
    "variant": variant,
    "is_prime": is_prime,
    "bounced": bounced,
    "add_to_cart": atc,
    "checkout_start": checkout_start,
    "time_on_pdp_s": time_on_pdp
})
df.head()

## 3) Helper functions: lift, CI, and tests for conversion metrics

For a binary metric (like ATC), we can compute:
- conversion rates in A and B
- absolute lift and relative lift
- a 95% CI for the difference in proportions (normal approx)

(For production work, you might use more robust methods; this is fine for portfolio demonstration.)


In [None]:
def rate(series: pd.Series) -> float:
    return float(series.mean())

def diff_ci_95(p1, n1, p2, n2):
    # 95% CI for (p2 - p1) using normal approximation
    se = math.sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))
    z = 1.96
    return (p2 - p1 - z*se, p2 - p1 + z*se)

def summarize_binary(df, col, segment_name="All"):
    a = df[df.variant=="A"][col]
    b = df[df.variant=="B"][col]
    p1, n1 = a.mean(), a.shape[0]
    p2, n2 = b.mean(), b.shape[0]
    abs_lift = p2 - p1
    rel_lift = abs_lift / p1 if p1 > 0 else np.nan
    ci = diff_ci_95(p1, n1, p2, n2)
    return {
        "segment": segment_name,
        "metric": col,
        "A_rate": p1,
        "B_rate": p2,
        "abs_lift": abs_lift,
        "rel_lift": rel_lift,
        "ci_low": ci[0],
        "ci_high": ci[1],
        "n_A": int(n1),
        "n_B": int(n2),
    }

def summarize_continuous(df, col, segment_name="All"):
    a = df[df.variant=="A"][col]
    b = df[df.variant=="B"][col]
    return {
        "segment": segment_name,
        "metric": col,
        "A_mean": float(a.mean()),
        "B_mean": float(b.mean()),
        "delta": float(b.mean() - a.mean()),
        "n_A": int(a.shape[0]),
        "n_B": int(b.shape[0]),
    }

## 4) Overall results (primary + guardrails)

In [None]:
rows = []
rows.append(summarize_binary(df, "add_to_cart", "All"))
rows.append(summarize_binary(df, "checkout_start", "All"))
rows.append(summarize_binary(df, "bounced", "All"))
res = pd.DataFrame(rows)
res

Interpretation tips:
- For success metrics (ATC / checkout), you want **positive** lift.
- For bounce, you want **negative** lift (lower is better).
- If the 95% CI for the difference includes **0**, the result is not conclusive.


In [None]:
# Convert rates to percent for readability
pretty = res.copy()
for c in ["A_rate","B_rate","abs_lift","rel_lift","ci_low","ci_high"]:
    pretty[c] = pretty[c].astype(float)
pretty[["segment","metric","n_A","n_B",
        "A_rate","B_rate","abs_lift","rel_lift","ci_low","ci_high"]]

## 5) Segmentation: Prime vs non‑Prime

This is the important Amazon nuance: a Prime-forward UI may help Prime users but hurt non‑Prime.


In [None]:
seg_rows = []
seg_rows.append(summarize_binary(df[df.is_prime], "add_to_cart", "Prime"))
seg_rows.append(summarize_binary(df[~df.is_prime], "add_to_cart", "Non‑Prime"))
seg_rows.append(summarize_binary(df[df.is_prime], "bounced", "Prime"))
seg_rows.append(summarize_binary(df[~df.is_prime], "bounced", "Non‑Prime"))
seg = pd.DataFrame(seg_rows)
seg

## 6) Guardrail: time on PDP (continuous)

Here we just compare means (simple). In production, you'd also look at distributions and outliers.


In [None]:
time_rows = []
time_rows.append(summarize_continuous(df, "time_on_pdp_s", "All"))
time_rows.append(summarize_continuous(df[df.is_prime], "time_on_pdp_s", "Prime"))
time_rows.append(summarize_continuous(df[~df.is_prime], "time_on_pdp_s", "Non‑Prime"))
pd.DataFrame(time_rows)

## 7) Decision write-up (template)

Fill this in with your run’s numbers.

- If ATC improves and bounce doesn’t worsen (especially for non‑Prime), consider shipping.
- If Prime improves but non‑Prime worsens, consider targeted rollout or a softer Prime pitch.


In [None]:
def decision_template(overall_atc, nonprime_atc, nonprime_bounce):
    return f"""Decision summary (example wording):

Overall ATC moved from {overall_atc['A_rate']:.3%} → {overall_atc['B_rate']:.3%} (abs lift {overall_atc['abs_lift']:.3%}, 95% CI [{overall_atc['ci_low']:.3%}, {overall_atc['ci_high']:.3%}]).

Non‑Prime ATC moved from {nonprime_atc['A_rate']:.3%} → {nonprime_atc['B_rate']:.3%} (abs lift {nonprime_atc['abs_lift']:.3%}, 95% CI [{nonprime_atc['ci_low']:.3%}, {nonprime_atc['ci_high']:.3%}]).

Non‑Prime bounce moved from {nonprime_bounce['A_rate']:.3%} → {nonprime_bounce['B_rate']:.3%} (abs lift {nonprime_bounce['abs_lift']:.3%}, 95% CI [{nonprime_bounce['ci_low']:.3%}, {nonprime_bounce['ci_high']:.3%}]).

I would (ship / not ship / do targeted rollout) because (reason). Next test: move the Prime pitch to cart post‑ATC or restrict the treatment to urgency categories.
"""

overall_atc = summarize_binary(df, "add_to_cart", "All")
nonprime_atc = summarize_binary(df[~df.is_prime], "add_to_cart", "Non‑Prime")
nonprime_bounce = summarize_binary(df[~df.is_prime], "bounced", "Non‑Prime")

print(decision_template(overall_atc, nonprime_atc, nonprime_bounce))