#### Exercise 4 - Starter - Difference-in-Differences

Your company sells dog toys and cat toys (possibly among other items). Your company hired a professional photographer who overhauled the product photos for dog toys. Starting on Oct 1 2024, all customers reaching the website and observing the dog toy products observed the new photos. 

Three months later, you are tasked to evaluate the impact of the professional photos on revenue to determine if it is worth making the investment for additional products. 

We have generated time series panel data for monthly revenue from dog toys and cat toys throughout 2023 and 2024. Revenue is driven by a baseline (separate for each product), seasonality, a general holiday season bump, a 'ground truth' effect for the professional photos, and random noise. 

The benefit of generating data this way is that we know what the true effect is and we can test how well the model performs under various circumstances.

In [1]:
import pandas as pd
import numpy as np
import statsmodels.formula.api as smf 
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
# Set random seed for reproducibility
np.random.seed(42)

# Parameters
n_months = 24
products = ["Dog Toys", "Cat Toys"]
months = pd.date_range(start="2023-01-01", periods=n_months, freq='MS')

# Marketing event date
event_date = pd.to_datetime('2024-10-01')
model_start_date = '2024-01-01'
model_end_date = '2025-01-01' 


# Create base dataframe
df = pd.DataFrame([(month, product) for month in months for product in products],
                  columns=["Month", "Product"])

# Add month number for seasonality
df["Month_Num"] = df["Month"].dt.month

# Seasonality: smooth wave + December holiday bump
df["Seasonality"] = 500 * np.sin((df["Month_Num"] - 1) / 12 * 2 * np.pi)
df["HolidayBump"] = np.where(df["Month_Num"] >= 11, 1000, 0)

# Add randomness and slight product-level difference
df["BaseRevenue"] = 10000
df["ProductGapBaseline"] = df["Product"].map({"Dog Toys": 200, "Cat Toys": 0})
df["Noise"] = np.random.normal(0, 50, size=len(df))


#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#%%%%%  This is the ground-truth effect of the treatment: 
#%%%%%  an extra $1000 monthly revenue for dog toys in Q4 2024 
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

df['BetterPhotos'] = np.where((df['Product']=='Dog Toys') & (df['Month'] >= event_date), 1000, 0)

# Final revenue
df["Revenue"] = (
    df["BaseRevenue"]
    + df["ProductGapBaseline"]
    + df["Seasonality"]
    + df["HolidayBump"]
    + df["BetterPhotos"]
    + df["Noise"]
)

# Final output DataFrame
df = df[["Month", "Product", "Revenue"]]


# TO-DO: Visualize the time series 



What would happen if we ran the interrupted time series model on dog toys from the previous lesson?

In [None]:
dogs = df[(df['Product']=='Dog Toys') & (df['Month'] >= model_start_date) & (df['Month'] < model_end_date)].copy()

# TO-DO: Estimate an Interrupted Time Series using only data for dog toys 

formula = 
model = smf.ols(formula, data=dogs).fit()
print(model.summary())

In [None]:
### TO-DO Calculate the average monthly effect from the ITS

its_effect = 
print(f"Avg Monthly Effect:{its_effect.sum()/dogs['post'].sum() : 0.1f}")


#### Estimate the difference-in-differences model

In [None]:
# TO-DO: Estimate the difference-in-differences model 
df['treated'] = 
df['post'] = 
df['post_treated'] = 

subset = 
formula = 
model = smf.ols(formula, data=subset).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                Revenue   R-squared:                       0.532
Model:                            OLS   Adj. R-squared:                  0.462
Method:                 Least Squares   F-statistic:                     7.580
Date:                Sun, 15 Jun 2025   Prob (F-statistic):            0.00141
Time:                        09:46:38   Log-Likelihood:                -176.89
No. Observations:                  24   AIC:                             361.8
Df Residuals:                      20   BIC:                             366.5
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                   coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------------------------------------------------
Intercept     1.012e+04    140.326     72.127   

In [None]:
### TO-DO: What is the average monthly effect from the Diff-in-Diff model?  How does it compare to ground truth?

did_effect = 
print(f"Avg Monthly Effect:{did_effect.sum()/subset['post_treated'].sum() : 0.1f}")


Avg Monthly Effect: 968.7
