# Chapter 14: Switchback and Geo-Experiments

This notebook contains all code examples from Chapter 14, demonstrating:
- Power simulation for geo-experiments
- Difference-in-Differences (DiD) regression analysis
- Hypothesis testing with DiD models

## Setup: Install Required Packages

In [None]:
# Install required packages (uncomment if needed)
# !pip install pandas numpy statsmodels

## Import Libraries

In [None]:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf

## Section 2.3.1: Power Simulation for Geo-Experiments

This function simulates a single Difference-in-Differences experiment to calculate statistical power.

In [None]:
def simulate_did_experiment(
    n_geos, n_days_pre, n_days_post,
    base_metric, mde, std_dev,
    alpha=0.05
):
    """
    Simulates a single Difference-in-Differences experiment.

    Args:
        n_geos (int): Total number of geographic units.
        n_days_pre (int): Number of days in the pre-period.
        n_days_post (int): Number of days in the post-period.
        base_metric (float): The baseline average value of the metric per geo per day.
        mde (float): The Minimum Detectable Effect (the true effect to simulate).
        std_dev (float): The standard deviation of the metric at the geo-day level.
        alpha (float): The significance level for the hypothesis test.

    Returns:
        bool: True if the null hypothesis was correctly rejected, False otherwise.
    """
    # 1. Create the dataset structure
    geos = range(n_geos)
    days = range(n_days_pre + n_days_post)
    
    df = pd.DataFrame([(g, d) for g in geos for d in days], columns=['geo', 'day'])
    
    # 2. Assign treatment/control and pre/post periods
    df['treat'] = (df['geo'] >= n_geos // 2).astype(int)
    df['post'] = (df['day'] >= n_days_pre).astype(int)
    
    # 3. Simulate the metric data
    # Add random noise
    noise = np.random.normal(0, std_dev, df.shape[0])
    
    # The treatment effect is only applied to the treatment group in the post period
    treatment_effect = df['treat'] * df['post'] * mde
    
    df['metric'] = base_metric + noise + treatment_effect
    
    # 4. Run the DiD regression
    model = smf.ols('metric ~ treat * post', data=df).fit()
    
    # 5. Check for significance
    p_value = model.pvalues['treat:post']
    
    return p_value < alpha

## Running the Power Simulation

We test the power for a design with 50 geos by running 500 simulations.

In [None]:
# --- Running the power simulation ---
N_SIMULATIONS = 500
significant_results = 0

# We are testing the power for a design with 50 geos
n_geos_to_test = 50

for _ in range(N_SIMULATIONS):
    if simulate_did_experiment(
        n_geos=n_geos_to_test, # 50 total cities (25 treat, 25 control)
        n_days_pre=30,      # 30 days of historical data
        n_days_post=30,     # 30 days for the experiment
        base_metric=1000,   # Avg 1000 rides/day
        mde=50,             # We want to detect a lift of 50 rides
        std_dev=250         # Std dev of rides at city-day level is 250
    ):
        significant_results += 1

power = significant_results / N_SIMULATIONS
print(f"Estimated Power with {n_geos_to_test} geos: {power:.2f}")
# Expected output: approximately 0.80 (varies with randomness in simulation).

## Section 2.3.2: DiD Regression and Hypothesis Testing

Generate a realistic dataset and run a DiD regression to interpret the results.

In [None]:
# 1. Generate a realistic, noisy dataset for the experiment
np.random.seed(42)  # for reproducibility
n_geos_per_group = 20
n_days_pre = 30
n_days_post = 30
total_geos = n_geos_per_group * 2
total_days = n_days_pre + n_days_post

# Create the main dataframe
geos = range(total_geos)
days = range(total_days)
df = pd.DataFrame([(g, d) for g in geos for d in days], columns=['geo', 'day'])

# Assign treatment/control and pre/post periods
df['treat'] = (df['geo'] >= n_geos_per_group).astype(int)
df['post'] = (df['day'] >= n_days_pre).astype(int)

# Simulate the metric data with noise and a SMALL treatment effect
base_rides = 800
pre_period_diff = 200  # Pre-existing difference between groups
time_trend = 80        # General upward trend over time
treatment_effect = 15  # This is the SMALL true effect we want to find

# Add random noise at the geo-day level
noise = np.random.normal(0, 150, df.shape[0])

# Build the metric
df['rides'] = (
    base_rides
    + pre_period_diff * df['treat']  # Treatment geos start higher
    + time_trend * df['post']        # All geos increase in post-period
    + treatment_effect * df['treat'] * df['post'] # The interaction effect
    + noise
)

## Fit the DiD Model

Run an Ordinary Least Squares (OLS) regression with the DiD formula.

In [None]:
# 2. Fit the DiD Ordinary Least Squares (OLS) model
# The formula 'rides ~ treat * post' automatically creates the main effects
# for 'treat' and 'post' as well as the interaction term 'treat:post'.
model = smf.ols('rides ~ treat * post', data=df).fit()

# 3. Print and interpret the summary
print(model.summary())