# Differences-in-differences: Pre and Post, Treatment and Control

### When to Use

DiD is an approach to causal inference when there is no randomized treatment, but we have (1) counterfactual data, (2) an intervention/treatment applied at time `t`, and (3) measurements pre + post intervention.

The idea behind DiD is simple. First, we compute the difference in the mean of the outcome between the two groups in the “Before” period, which is (A) in the above graph. Second, we compute the same for the “After” period, which is (B). Then we take the “second difference”, which is the difference between (A) and (B) and is labeled as (C). This second difference measures how the change in outcome differs between the two groups. The difference is attributed to the causal effect of the intervention.

<img src="../did.png" style="width: 500px;">

### Assumptions

**Parallel pre-intervention trends:** The treatment and control groups have parallel trends in the outcome. This means in the absence of the intervention, the difference between the treatment and control group is constant over time. You may have to condition on a set confounding variable to align the trends.

**No spillover:** The intervention only affects the treatment group



## Approach

The idea behind DiD is simple. First, we compute the difference in the mean of the outcome between the two groups in the “Before” period, which is (A) in the above graph. Second, we compute the same for the “After” period, which is (B). Then we take the “second difference”, which is the difference between (A) and (B) and is labeled as (C). This second difference measures how the change in outcome differs between the two groups. The difference is attributed to the causal effect of the intervention.


## Control

### Choosing a Control

Choice of the control group is critical in DiD. It behaves as the counterfactual or “what if” scenario. **The control must have the same expected future outcome**, i.e. The two groups produce parallel trends across time. To achieve this you may have to:

1. Subset/filter your control or treatment groups so your populations match. [Abadie and Gardeazabal, 2001](https://www.nber.org/system/files/working_papers/w8478/w8478.pdf)

2. Condition on population characteristics in a regression model e.g. income, age, etc.. [Card and Krueger, 2013](https://davidcard.berkeley.edu/papers/njmin-aer.pdf)


## Example

To make this concrete, suppose we are interested in the e§ect of the minimum wage on employment, a classic question in Labor Economics. In a competitive labor market, increases in the minimum wage move us up a downward-sloping demand curve. Higher minimums therefore reduce employment, perhaps hurting the very workers minimum-wage policies were designed to help. Card and Krueger (1994) use a dramatic change in the New Jersey state minimum wage to see if this is true.

In [1]:
import pandas as pd
import statsmodels.api as sm
import statsmodels.formula.api as smf
import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

In [2]:
data = pd.read_csv('../data/njmin3.csv')

In [3]:
data.head()

Unnamed: 0,CO_OWNED,SOUTHJ,CENTRALJ,PA1,PA2,DEMP,nj,bk,kfc,roys,wendys,d,d_nj,fte
0,0,0,1,0,0,12.0,1,1,0,0,0,0,0,15.0
1,0,0,1,0,0,6.5,1,1,0,0,0,0,0,15.0
2,0,0,1,0,0,-1.0,1,0,0,1,0,0,0,24.0
3,1,0,0,0,0,2.25,1,0,0,1,0,0,0,19.25
4,0,0,0,0,0,13.0,1,1,0,0,0,0,0,21.5


he difference-in-differences estimator  δ̂   is defined as in Equation  7 .

δ̂ =(y¯T,A−y¯C,A)−(y¯T,B−y¯C,B)(7)
Instead of manually calculating the four means and their difference-in-differences, it is possible to estimate the difference-in-differences estimator and its statistical properties by running a regression that includes indicator variables for treatment and after and their interaction term. The advantage of a regression over simply using Equation  7  is that the regression allows taking into account other factors that might influence the treatment effect. The simplest difference-in-differences regression model is presented in Equation  8 , where  yit  is the response for unit  i  in period  t . In the typical difference-in-differences model there are only two periods, before and after.

yit=β1+β2T+β3A+δT×A+eit(8)
With a litle algebra it can be seen that the coefficinet  δ  on the interaction term in Equation  8  is exactly the difference-in-differences estimator defined in Equation  7 . The following example calculates this estimator for the dataset  njmin3 , where the response is  fte , the full-time equivalent employment,  d  is the after dummy, with  d=1  for the after period and  d=0  for the before period, and  nj  is the dummy that marks the treatment group ( nji=1  if unit  i  is in New Jersey where the minimum wage law has been changed, and  nji=0  if unit  i  in Pennsylvania, where the minimum wage law has not changed). In other words, units (fast-food restaurants) located in New Jersey form the treatment group, and units located in Pennsylvania form the control group.

In [4]:
data = data.dropna(subset=['fte'])
y = data['fte']
X = data.drop(['fte', 'DEMP'], axis=1)

In [7]:
model = smf.OLS.from_formula("fte ~ nj + d + nj*d + bk + kfc + roys + wendys + CO_OWNED + SOUTHJ + CENTRALJ + PA1 + PA2", data=data)
res = model.fit()
res.summary()

0,1,2,3
Dep. Variable:,fte,R-squared:,0.22
Model:,OLS,Adj. R-squared:,0.21
Method:,Least Squares,F-statistic:,22.03
Date:,"Mon, 30 Nov 2020",Prob (F-statistic):,1.72e-36
Time:,07:20:30,Log-Likelihood:,-2808.8
No. Observations:,794,AIC:,5640.0
Df Residuals:,783,BIC:,5691.0
Df Model:,10,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,-1.101e+14,8.73e+13,-1.261,0.208,-2.81e+14,6.12e+13
nj,1.058e+14,8.39e+13,1.261,0.208,-5.88e+13,2.7e+14
d,-2.2059,1.350,-1.633,0.103,-4.857,0.445
nj:d,2.7258,1.506,1.810,0.071,-0.230,5.682
bk,4.307e+12,3.41e+12,1.261,0.208,-2.4e+12,1.1e+13
kfc,4.307e+12,3.41e+12,1.261,0.208,-2.4e+12,1.1e+13
roys,4.307e+12,3.41e+12,1.261,0.208,-2.4e+12,1.1e+13
wendys,4.307e+12,3.41e+12,1.261,0.208,-2.4e+12,1.1e+13
CO_OWNED,-0.6866,0.720,-0.953,0.341,-2.100,0.727

0,1,2,3
Omnibus:,291.202,Durbin-Watson:,2.043
Prob(Omnibus):,0.0,Jarque-Bera (JB):,1783.184
Skew:,1.529,Prob(JB):,0.0
Kurtosis:,9.675,Cond. No.,3460000000000000.0


In [24]:
data.columns

Index(['CO_OWNED', 'SOUTHJ', 'CENTRALJ', 'PA1', 'PA2', 'DEMP', 'nj', 'bk',
       'kfc', 'roys', 'wendys', 'd', 'd_nj', 'fte'],
      dtype='object')

Further Reading:

https://medium.com/analytics-vidhya/identify-causality-by-difference-in-differences-78ad8335fb7c