# RDD Analysis: Formal Estimation and Sensitivity Checks

## What I'm Doing Here

In notebook 01, I generated synthetic data and validated that manipulation isn't happening. Now I'm running the actual RDD analysis to estimate the treatment effect.

**My learning goal:** Understand how to:
- Estimate treatment effects with proper standard errors
- Test if results are sensitive to analytical choices (bandwidth, controls)
- Validate that I'm not finding spurious effects (placebo tests)

**Known truth:** I built an 8 percentage point effect into the data. Can I recover it?

## Setup and Data Loading

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import statsmodels.formula.api as smf
from scipy import stats

# Load the data I generated in notebook 01
df = pd.read_csv('../data/rdd_ecommerce.csv')

print(f"Loaded {len(df):,} shopping sessions")
print(f"True treatment effect: 8.0 percentage points (built into data)")

Loaded 10,000 shopping sessions
True treatment effect: 8.0 percentage points (built into data)


## Step 1: Prepare Data for RDD Regression

**The RDD specification I'm using:**

```
Y = β₀ + β₁(cart_value - 50) + β₂(treatment) + β₃(treatment × cart_value) + ε
```

**Why center at 50?** Makes β₂ directly interpretable as the treatment effect AT the cutoff.

**Why the interaction?** Allows the relationship between cart value and completion to differ on each side of the cutoff (different slopes).

**Initial choice: Bandwidth = €20**

I'm starting with €20 bandwidth (comparing €30-70). This is somewhat arbitrary - I'll test sensitivity to this choice later.

In [2]:
cutoff = 50.0
bandwidth = 20.0

# Filter to bandwidth window
df_rdd = df[
    (df['cart_value'] >= cutoff - bandwidth) & 
    (df['cart_value'] <= cutoff + bandwidth)
].copy()

# Center running variable at cutoff
df_rdd['cart_centered'] = df_rdd['cart_value'] - cutoff

# Create interaction term
df_rdd['treat_x_cart'] = df_rdd['treatment'] * df_rdd['cart_centered']

print(f"Sample size in €{cutoff-bandwidth:.0f}-{cutoff+bandwidth:.0f} window: {len(df_rdd):,}")
print(f"  Control (cart < €50): {(df_rdd['treatment']==0).sum():,}")
print(f"  Treatment (cart ≥ €50): {(df_rdd['treatment']==1).sum():,}")
print(f"\nGood sample size for estimation.")

Sample size in €30-70 window: 5,835
  Control (cart < €50): 3,050
  Treatment (cart ≥ €50): 2,785

Good sample size for estimation.


## Step 2: Run the RDD Regression

This is the core of the analysis. I'm using OLS to estimate the discontinuity at the cutoff.

In [3]:
# Estimate RDD model
formula = 'completed_purchase ~ cart_centered + treatment + treat_x_cart'
model = smf.ols(formula, data=df_rdd).fit()

print("="*70)
print("RDD REGRESSION RESULTS")
print("="*70)
print(model.summary())

RDD REGRESSION RESULTS
                            OLS Regression Results                            
Dep. Variable:     completed_purchase   R-squared:                       0.011
Model:                            OLS   Adj. R-squared:                  0.010
Method:                 Least Squares   F-statistic:                     21.00
Date:                Sat, 29 Nov 2025   Prob (F-statistic):           1.59e-13
Time:                        15:39:30   Log-Likelihood:                -4203.3
No. Observations:                5835   AIC:                             8415.
Df Residuals:                    5831   BIC:                             8441.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
Intercept         0.478

## Extract and Interpret Key Results

In [4]:
# Extract treatment effect estimate
treatment_effect = model.params['treatment']
std_error = model.bse['treatment']
t_stat = model.tvalues['treatment']
p_value = model.pvalues['treatment']
ci_lower, ci_upper = model.conf_int().loc['treatment']

print("\n" + "="*70)
print("RDD TREATMENT EFFECT ESTIMATE")
print("="*70)
print(f"Point Estimate: {treatment_effect:.4f} ({treatment_effect*100:.2f} percentage points)")
print(f"Standard Error: {std_error:.4f}")
print(f"95% Confidence Interval: [{ci_lower:.4f}, {ci_upper:.4f}]")
print(f"                         [{ci_lower*100:.2f}pp, {ci_upper*100:.2f}pp]")
print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.4f}")
print(f"\nStatistically significant? {'YES' if p_value < 0.05 else 'NO'} (at 5% level)")

print("\n" + "="*70)
print("COMPARISON TO TRUE EFFECT")
print("="*70)
print(f"True effect (from data generation): 8.0 percentage points")
print(f"My RDD estimate: {treatment_effect*100:.2f} percentage points")
print(f"Estimation error: {abs(treatment_effect*100 - 8.0):.2f} percentage points")

if ci_lower*100 <= 8.0 <= ci_upper*100:
    print(f"\n✓ Confidence interval captures the true effect!")
else:
    print(f"\n✗ Confidence interval missed the true effect (unusual with good RDD)")


RDD TREATMENT EFFECT ESTIMATE
Point Estimate: 0.0684 (6.84 percentage points)
Standard Error: 0.0254
95% Confidence Interval: [0.0187, 0.1181]
                         [1.87pp, 11.81pp]
t-statistic: 2.697
p-value: 0.0070

Statistically significant? YES (at 5% level)

COMPARISON TO TRUE EFFECT
True effect (from data generation): 8.0 percentage points
My RDD estimate: 6.84 percentage points
Estimation error: 1.16 percentage points

✓ Confidence interval captures the true effect!


## My Interpretation

**What I found:**
- Free shipping increases purchase completion by approximately 6-7 percentage points for customers near the €50 threshold
- The effect is statistically significant (p < 0.05)
- My confidence interval includes the true 8pp effect I built into the data

**Why isn't my point estimate exactly 8.0pp?**
- Random sampling variation (I have 10,000 observations, not infinite)
- I added realistic noise to the data generation process
- Being within ~1-2pp of the truth is actually quite good

**Key takeaway:** RDD successfully recovered the causal effect despite confounding in the naive comparison (which showed ~13pp).

## Visualization: Estimate vs Truth

In [5]:
fig = go.Figure()

# True effect
fig.add_trace(go.Scatter(
    x=[8.0],
    y=[0],
    mode='markers+text',
    marker=dict(size=20, color='green', symbol='star'),
    text=['True Effect<br>8.0pp'],
    textposition='top center',
    name='True Effect',
    showlegend=False
))

# Point estimate
fig.add_trace(go.Scatter(
    x=[treatment_effect*100],
    y=[0],
    mode='markers',
    marker=dict(size=15, color='blue'),
    name='My Estimate',
    showlegend=False
))

# Confidence interval
fig.add_shape(
    type='line',
    x0=ci_lower*100, x1=ci_upper*100,
    y0=0, y1=0,
    line=dict(color='blue', width=3),
)

fig.add_annotation(
    x=treatment_effect*100, y=0.05,
    text=f'Estimate: {treatment_effect*100:.2f}pp<br>95% CI: [{ci_lower*100:.2f}, {ci_upper*100:.2f}]',
    showarrow=True,
    arrowhead=2,
    bgcolor='lightblue',
    opacity=0.8
)

fig.update_layout(
    title='RDD Estimate vs True Effect',
    xaxis_title='Treatment Effect (percentage points)',
    yaxis_visible=False,
    height=300,
    xaxis=dict(range=[0, 14])
)

fig.show()
fig.write_image("../outputs/figures/03_rdd_estimate_vs_true_effect.png", scale=2)

## Sensitivity Analysis 1: Does Bandwidth Choice Matter?

**The question:** I chose bandwidth = €20 somewhat arbitrarily. Would I get a different answer with €10 or €30?

**Why this matters:** 
- Narrow bandwidth: Customers more comparable (lower bias), but fewer observations (higher variance)
- Wide bandwidth: More observations (lower variance), but less comparable (higher bias)

**What I'm testing:** Whether my estimate is robust to reasonable bandwidth choices.

In [6]:
def estimate_rdd_at_bandwidth(df, cutoff, bandwidth):
    """
    Estimate RDD treatment effect for a given bandwidth.
    
    Args:
        df: DataFrame with shopping session data
        cutoff: Treatment cutoff value (€50)
        bandwidth: How far from cutoff to include (e.g., 20 for €30-70)
    
    Returns:
        Dictionary with estimate, std error, CI, p-value, and sample size
    """
    # Filter to bandwidth
    df_temp = df[
        (df['cart_value'] >= cutoff - bandwidth) & 
        (df['cart_value'] <= cutoff + bandwidth)
    ].copy()
    
    # Create RDD variables
    df_temp['cart_centered'] = df_temp['cart_value'] - cutoff
    df_temp['treat_x_cart'] = df_temp['treatment'] * df_temp['cart_centered']
    
    # Run regression
    model = smf.ols(
        'completed_purchase ~ cart_centered + treatment + treat_x_cart',
        data=df_temp
    ).fit()
    
    # Extract results
    return {
        'bandwidth': bandwidth,
        'n_obs': len(df_temp),
        'estimate': model.params['treatment'],
        'std_error': model.bse['treatment'],
        'ci_lower': model.conf_int().loc['treatment', 0],
        'ci_upper': model.conf_int().loc['treatment', 1],
        'p_value': model.pvalues['treatment']
    }

# Test multiple bandwidths
bandwidths_to_test = [5, 10, 15, 20, 25, 30]
results = []

for bw in bandwidths_to_test:
    result = estimate_rdd_at_bandwidth(df, cutoff=50.0, bandwidth=bw)
    results.append(result)

# Convert to DataFrame for easy viewing
results_df = pd.DataFrame(results)

# Format for display
results_df['Estimate (pp)'] = (results_df['estimate'] * 100).round(2)
results_df['Std Error'] = results_df['std_error'].round(4)
results_df['CI Lower (pp)'] = (results_df['ci_lower'] * 100).round(2)
results_df['CI Upper (pp)'] = (results_df['ci_upper'] * 100).round(2)
results_df['p-value'] = results_df['p_value'].round(4)

display_cols = ['bandwidth', 'n_obs', 'Estimate (pp)', 'Std Error', 
                'CI Lower (pp)', 'CI Upper (pp)', 'p-value']

print("\nBANDWIDTH SENSITIVITY ANALYSIS")
print("="*90)
print(results_df[display_cols].to_string(index=False))
print("\n" + "="*90)
print(f"True effect: 8.0 percentage points")
print(f"Estimates range from {results_df['Estimate (pp)'].min():.2f}pp to {results_df['Estimate (pp)'].max():.2f}pp")
print(f"Mean across specifications: {results_df['Estimate (pp)'].mean():.2f}pp")


BANDWIDTH SENSITIVITY ANALYSIS
 bandwidth  n_obs  Estimate (pp)  Std Error  CI Lower (pp)  CI Upper (pp)  p-value
         5   1594           6.83     0.0501          -3.01          16.66   0.1736
        10   3134           4.12     0.0352          -2.77          11.02   0.2412
        15   4590           6.99     0.0290           1.31          12.67   0.0158
        20   5835           6.84     0.0254           1.87          11.81   0.0070
        25   6832           6.62     0.0230           2.11          11.13   0.0040
        30   7613           7.72     0.0214           3.53          11.90   0.0003

True effect: 8.0 percentage points
Estimates range from 4.12pp to 7.72pp
Mean across specifications: 6.52pp


## Bandwidth Sensitivity Visualization

In [7]:
fig = go.Figure()

# Point estimates
fig.add_trace(go.Scatter(
    x=results_df['bandwidth'],
    y=results_df['Estimate (pp)'],
    mode='markers+lines',
    name='Point Estimate',
    marker=dict(size=10, color='blue'),
    line=dict(color='blue', width=2)
))

# Confidence intervals
for _, row in results_df.iterrows():
    fig.add_trace(go.Scatter(
        x=[row['bandwidth'], row['bandwidth']],
        y=[row['CI Lower (pp)'], row['CI Upper (pp)']],
        mode='lines',
        line=dict(color='lightblue', width=2),
        showlegend=False,
        hoverinfo='skip'
    ))

# True effect line
fig.add_hline(
    y=8.0,
    line_dash='dash',
    line_color='green',
    annotation_text='True Effect (8.0pp)',
    annotation_position='right'
)

fig.update_layout(
    title='RDD Estimate Across Different Bandwidths',
    xaxis_title='Bandwidth (€)',
    yaxis_title='Treatment Effect (percentage points)',
    height=500,
    showlegend=True
)

fig.show()
fig.write_image("../outputs/figures/04_bandwidth_sensitivity_analysis.png", scale=2)

## Reflection: Bandwidth Sensitivity

**What I observe:**
- Estimates are relatively stable across bandwidths (roughly 4-8pp)
- All confidence intervals capture the true 8pp effect
- Smaller bandwidths have wider CIs (more uncertainty)
- Larger bandwidths are more precise but could introduce bias

**What surprised me:**
- BW=10 gives a lower estimate (4.12pp) but with high uncertainty (p=0.24, not significant)
- This isn't bias - it's just noise from small sample (n=3,134)
- With more data, even narrow bandwidth would recover the true effect

**Conclusion:** My main estimate (BW=20, 6.84pp) is robust. Not an artifact of arbitrary bandwidth choice.

## Sensitivity Analysis 2: Does Covariate Imbalance Matter?

**Background:** In notebook 01, I found a slight imbalance in previous_purchases (p=0.043). Control group averaged 2.46 purchases, treatment averaged 2.18 purchases.

**The question:** Does this 0.28 purchase difference bias my estimate?

**My hypothesis:** No, because:
1. The practical difference is tiny (2.46 vs 2.18)
2. RDD already controls for cart value, which correlates with purchase history
3. Adding the control shouldn't change the estimate much

**The test:** Re-run RDD with previous_purchases as a control variable and compare.

In [8]:
# Baseline model (no covariate control)
model_baseline = smf.ols(
    'completed_purchase ~ cart_centered + treatment + treat_x_cart',
    data=df_rdd
).fit()

# Model with previous_purchases control
model_with_control = smf.ols(
    'completed_purchase ~ cart_centered + treatment + treat_x_cart + previous_purchases',
    data=df_rdd
).fit()

print("COVARIATE CONTROL SENSITIVITY TEST")
print("="*70)
print(f"\nBaseline (no control):")
print(f"  Treatment Effect: {model_baseline.params['treatment']*100:.2f}pp")
print(f"  Standard Error: {model_baseline.bse['treatment']:.4f}")
print(f"  95% CI: [{model_baseline.conf_int().loc['treatment', 0]*100:.2f}pp, "
      f"{model_baseline.conf_int().loc['treatment', 1]*100:.2f}pp]")

print(f"\nWith previous_purchases control:")
print(f"  Treatment Effect: {model_with_control.params['treatment']*100:.2f}pp")
print(f"  Standard Error: {model_with_control.bse['treatment']:.4f}")
print(f"  95% CI: [{model_with_control.conf_int().loc['treatment', 0]*100:.2f}pp, "
      f"{model_with_control.conf_int().loc['treatment', 1]*100:.2f}pp]")

difference = abs(model_baseline.params['treatment'] - model_with_control.params['treatment']) * 100
print(f"\nDifference: {difference:.2f} percentage points")

if difference < 0.5:
    print(f"\n✓ The covariate imbalance has negligible impact on the estimate.")
elif difference < 1.0:
    print(f"\n⚠️  Small impact from covariate imbalance, but still minor.")
else:
    print(f"\n✗ Covariate imbalance appears to meaningfully affect the estimate.")

COVARIATE CONTROL SENSITIVITY TEST

Baseline (no control):
  Treatment Effect: 6.84pp
  Standard Error: 0.0254
  95% CI: [1.87pp, 11.81pp]

With previous_purchases control:
  Treatment Effect: 6.89pp
  Standard Error: 0.0254
  95% CI: [1.92pp, 11.87pp]

Difference: 0.05 percentage points

✓ The covariate imbalance has negligible impact on the estimate.


## Reflection: Covariate Control

**What I found:** Adding previous_purchases changes the estimate by less than 0.1 percentage points.

**What this confirms:** 
- The slight statistical imbalance (p=0.043) is not practically meaningful
- RDD's local comparison already handles this
- My estimate is not driven by differences in purchase history

**Key learning:** In RDD, statistical significance (p<0.05) doesn't automatically mean practical significance. A 0.28 purchase difference is trivial and doesn't bias the estimate.

**Professional reporting:** I would note the imbalance, show this robustness check, and conclude it doesn't affect results.

## Placebo Test 1: Fake Cutoff at €40

**The logic:** If my RDD methodology is sound, I should NOT find a discontinuity at €40 (where free shipping doesn't actually kick in).

**Why this matters:** A "significant" effect at €40 would suggest:
- My method is picking up random noise, OR
- There's something else changing at €40 (product mix, customer type), OR
- My continuity assumption is violated

**Expected result:** No significant jump at €40 (p > 0.05, estimate near 0)

In [9]:
def run_placebo_test(df, fake_cutoff, bandwidth=20):
    """
    Test for discontinuity at a fake cutoff where treatment doesn't change.
    
    Args:
        df: Full dataset
        fake_cutoff: Placebo cutoff value (e.g., 40)
        bandwidth: Window around placebo cutoff
    
    Returns:
        Dictionary with placebo test results
    """
    # Filter to bandwidth around fake cutoff
    df_placebo = df[
        (df['cart_value'] >= fake_cutoff - bandwidth) &
        (df['cart_value'] <= fake_cutoff + bandwidth)
    ].copy()
    
    # Create fake treatment (as if cutoff were at fake_cutoff)
    df_placebo['fake_treatment'] = (df_placebo['cart_value'] >= fake_cutoff).astype(int)
    df_placebo['cart_centered'] = df_placebo['cart_value'] - fake_cutoff
    df_placebo['fake_treat_x_cart'] = df_placebo['fake_treatment'] * df_placebo['cart_centered']
    
    # Run RDD regression at fake cutoff
    model = smf.ols(
        'completed_purchase ~ cart_centered + fake_treatment + fake_treat_x_cart',
        data=df_placebo
    ).fit()
    
    return {
        'cutoff': fake_cutoff,
        'n_obs': len(df_placebo),
        'estimate': model.params['fake_treatment'],
        'std_error': model.bse['fake_treatment'],
        'p_value': model.pvalues['fake_treatment'],
        'ci_lower': model.conf_int().loc['fake_treatment', 0],
        'ci_upper': model.conf_int().loc['fake_treatment', 1]
    }

# Run placebo test at €40
placebo_40 = run_placebo_test(df, fake_cutoff=40, bandwidth=20)

print("PLACEBO TEST: Fake Cutoff at €40")
print("="*70)
print(f"Sample size (€20-60): {placebo_40['n_obs']:,}")
print(f"\nEstimate: {placebo_40['estimate']*100:.2f} percentage points")
print(f"Standard Error: {placebo_40['std_error']:.4f}")
print(f"95% CI: [{placebo_40['ci_lower']*100:.2f}pp, {placebo_40['ci_upper']*100:.2f}pp]")
print(f"p-value: {placebo_40['p_value']:.4f}")

if placebo_40['p_value'] > 0.05:
    print(f"\n✓ No significant effect at €40 (as expected - there's no treatment there!)")
    print(f"   This validates that my RDD method isn't finding spurious discontinuities.")
else:
    print(f"\n⚠️  WARNING: Significant effect at placebo cutoff!")
    print(f"   This suggests either: (a) random chance, or (b) violation of assumptions")

PLACEBO TEST: Fake Cutoff at €40
Sample size (€20-60): 5,375

Estimate: -3.46 percentage points
Standard Error: 0.0261
95% CI: [-8.57pp, 1.66pp]
p-value: 0.1852

✓ No significant effect at €40 (as expected - there's no treatment there!)
   This validates that my RDD method isn't finding spurious discontinuities.


## Placebo Test 2: Fake Cutoff at €60

**Same logic:** Testing at €60 where free shipping already applies on both sides, so there should be no jump.

**Why test multiple placebos:** Increases confidence that null results aren't just luck. If I test 2 placebos and both show no effect, the real cutoff showing an effect is more credible.

In [10]:
# Run placebo test at €60
placebo_60 = run_placebo_test(df, fake_cutoff=60, bandwidth=20)

print("PLACEBO TEST: Fake Cutoff at €60")
print("="*70)
print(f"Sample size (€40-80): {placebo_60['n_obs']:,}")
print(f"\nEstimate: {placebo_60['estimate']*100:.2f} percentage points")
print(f"Standard Error: {placebo_60['std_error']:.4f}")
print(f"95% CI: [{placebo_60['ci_lower']*100:.2f}pp, {placebo_60['ci_upper']*100:.2f}pp]")
print(f"p-value: {placebo_60['p_value']:.4f}")

if placebo_60['p_value'] > 0.05:
    print(f"\n✓ No significant effect at €60 (as expected!)")
else:
    print(f"\n⚠️  WARNING: Significant effect at placebo cutoff!")

PLACEBO TEST: Fake Cutoff at €60
Sample size (€40-80): 5,372

Estimate: -2.38 percentage points
Standard Error: 0.0266
95% CI: [-7.60pp, 2.84pp]
p-value: 0.3722

✓ No significant effect at €60 (as expected!)


## Placebo Tests Visualization

In [11]:
# Combine real and placebo estimates
all_estimates = pd.DataFrame([
    {
        'Test': 'Real Cutoff (€50)',
        'Estimate': treatment_effect * 100,
        'CI_lower': ci_lower * 100,
        'CI_upper': ci_upper * 100,
        'p_value': p_value,
        'color': 'blue'
    },
    {
        'Test': 'Placebo (€40)',
        'Estimate': placebo_40['estimate'] * 100,
        'CI_lower': placebo_40['ci_lower'] * 100,
        'CI_upper': placebo_40['ci_upper'] * 100,
        'p_value': placebo_40['p_value'],
        'color': 'gray'
    },
    {
        'Test': 'Placebo (€60)',
        'Estimate': placebo_60['estimate'] * 100,
        'CI_lower': placebo_60['ci_lower'] * 100,
        'CI_upper': placebo_60['ci_upper'] * 100,
        'p_value': placebo_60['p_value'],
        'color': 'gray'
    }
])

fig = go.Figure()

# Point estimates
for _, row in all_estimates.iterrows():
    fig.add_trace(go.Scatter(
        x=[row['Estimate']],
        y=[row['Test']],
        mode='markers',
        marker=dict(
            size=12,
            color=row['color'],
            symbol='diamond' if 'Real' in row['Test'] else 'circle'
        ),
        showlegend=False
    ))
    
    # Confidence intervals
    fig.add_trace(go.Scatter(
        x=[row['CI_lower'], row['CI_upper']],
        y=[row['Test'], row['Test']],
        mode='lines',
        line=dict(color=row['color'], width=3),
        showlegend=False
    ))

# Add zero line
fig.add_vline(x=0, line_dash="dash", line_color="black", opacity=0.5)

# Add true effect line
fig.add_vline(
    x=8.0,
    line_dash="dash",
    line_color="green",
    annotation_text="True Effect (8.0pp)"
)

fig.update_layout(
    title='Real Effect vs Placebo Tests',
    xaxis_title='Treatment Effect (percentage points)',
    yaxis_title='',
    height=400,
    xaxis=dict(range=[-5, 12])
)

fig.show()
fig.write_image("../outputs/figures/05_placebo_tests.png", scale=2)

print("\nInterpretation:")
print("- Real cutoff (€50): Large effect, significant, CI includes true 8pp")
print("- Placebo cutoffs (€40, €60): Small effects, not significant, CIs include 0")


Interpretation:
- Real cutoff (€50): Large effect, significant, CI includes true 8pp
- Placebo cutoffs (€40, €60): Small effects, not significant, CIs include 0


## Reflection: Placebo Tests

**What I found:**
- Real cutoff (€50): 6.84pp effect, p=0.007 (highly significant)
- Placebo €40: Small effect, p>0.05 (not significant)
- Placebo €60: Small effect, p>0.05 (not significant)

**What this validates:**
- RDD method isn't just finding random discontinuities
- The effect at €50 is real, not a statistical artifact
- Continuity assumption appears to hold (no jumps where there shouldn't be)

**Why this matters for credibility:**
If I showed only the €50 result, skeptics might say "maybe it's just noise." By showing that placebo cutoffs yield null results, I can demonstrate the method works correctly.

## Summary: Complete Sensitivity Analysis

I tested whether my 6.84pp estimate is robust to:

**1. Bandwidth choice:**
- Tested 6 different bandwidths (€5 to €30)
- Estimates stable across reasonable bandwidths
- All CIs capture true 8pp effect
- Result is robust

**2. Covariate imbalance:**
- Slight imbalance in previous_purchases (p=0.043)
- Controlling for it changes estimate by <0.1pp
- Imbalance doesn't bias results

**3. Placebo tests:**
- No significant effects at fake cutoffs (€40, €60)
- Real cutoff shows strong effect
- Not finding spurious discontinuities

**Overall assessment:** RDD estimate of 6.84pp is:
- Close to true effect (8.0pp)
- Statistically significant
- Robust to analytical choices
- Validated by placebo tests
- Credible causal estimate

## Key Learnings from This Analysis

**What I learned about RDD:**
1. Point estimates will vary with sampling noise - that's normal
2. Confidence intervals are more informative than point estimates
3. Sensitivity analysis builds credibility - show robustness, not just one specification
4. Placebo tests are powerful validation tools
5. Statistical vs practical significance matters (covariate imbalance example)

**What I'd do differently in real analysis:**
- Use optimal bandwidth selection (Imbens-Kalyanaraman) instead of arbitrary choice
- Test more placebo cutoffs to be thorough
- Check for heterogeneous effects (does free shipping work better for some customers?)
- Be more careful about business context (what else might change at €50?)

**What surprised me:**
- How well RDD recovered the true effect despite noise and confounding
- How little the covariate imbalance actually mattered
- How narrow bandwidths can be too noisy even with 10,000 observations