# Causal Inference for Policy Evaluation

Susan Athey, Stanford

06/20/2016

## Causal Inference in Social Sciences

I.e...
- What was the impact of the policy?
  - Minimum wage, training program, class size change, etc.
- Did the advertising campaign work? What was the ROI?


__Training data scientists to answer *business questions* rather than *research questions*__

Two questions to consider:
1. What would happen to prices, consumption, consumer welfare, and firm profits if two firms merge?
2. What would happen to platform revenue, advertiser profits and consumer welfare if we switch from a generalized second price auctino to a Vickery auction?

In other words... *what would happen if everything suddenly changed?*

## Correlation vs. Causality in Search

What is the effect of the position you place a link?
- There is a large correlation between page location and clickthrough rate
  - But is the link at the top because it's good? Or is it "good" because it's at the top?
  
How to evaluate causality versus correlation?
- A/B testing
  - Can quantify the "gap" between clickthroughs of the top link in different positions

### Conventions and Approaches

1. Come up with a way to separate correlations from causality (deal with confounders)
  - Randomized experiment
  - "Natural experiments"
    - I.e., to learn about military service, study the randomness of draft numbers in the Vietnam draft
    - If one state changes a law and a neighboring state doesn't
    - __Caveat:__ sometimes difficult to identify a control group or construct a synthetic one
  - Assume that agents respond optimally to confounders, and infer them
    - For instance, we don't know how much exactly people value the Staples brand, so it's difficult to estimate the brand component in pricing, but if we assume Staples has economists designing maximization pricing models, we can assume some latent factors
2. Estimate a model
  - *__NOT__ the best in-sample fit!!!!*
  - Focus on estimation of treatment effect parameters or predictions about the impact of a treatment

__Key differences with supervised machine learning__
1. Train/test paradigm breaks
  - Ground truth is not observed for your ultimate goal
  - If we give half the room coffee, and not the other:
    - We don't know how alert one half would have been *without* it
    - We don't know how alert the other half of the room would have been *with* it
  - Ultimately a missing data problem
    - We can't draw inferences
  - Even with randomized experiments, can't have good predictions at personal level, just group-wide level
2. The objective function is different
  - Trying to predict the effect of a treatment rather than an outcome
  - Sacrifice MSE of outcome predictions
    - For the sake of pursuing better MSE on out-of-sample predictions on effect
3. Statistical properties often table stakes
  - Want to prove the effect is real, and not just sample variance
  - However, impediment to the progress of science

__Key similarities with supervised machine learning__
1. Prediction is key component of causal inference
2. Causal inference in big data settings benefits from flexible model selection
3. Problems of causal inference are prevalent in settings where ML is used
  - What is the effect of the position of an ad or an algorithmic link in search?
  - How many clicks would an ad receive if it were placed in the first position?
  - Personalized recommendations or policies
4. Statistical issues have analogs in ML


## Models for Causal Inference

__Potential outcomes__: 
1. **_Y<sub>i</sub>(w)_** is the outcome unit *i* would have if assigned treatment *w*
2. Binary treatments
  - Treatment effect is **_T<sub>i</sub> = Y<sub>i</sub>(1) - Y<sub>i</sub>(0)_**
  - **_ATE<sub>i</sub>E[T<sub>i</sub>]_**
  
__Randomized experiments__
- Gold standard for causal inference
- Two samples; treament assignment indepenedent of potential outcomes


### Experimental settings
1. Reducing variance for average treatment effect estimation
  - Individuals may be very different
    - Large v. small advertisers in search
    - Heavy v. light users
  - Groups may be imbalanced due to sampling variation
    - Carefully design stratification of samples
2. Estimating heterogeneous treatment effects and optimal treatment policies
  - Discovery partition and test hypotheses about treatment effects
    - Back to careful strata design
      - Careful to personalize
  - Nonparametric models of heterogeneous treatment effects
  - Optimal (personalized) policies
  
### Observational studies
1. Estimating average treament effects under unconfoundedness
2. Instrumental variables
  - Identify variables that shift the treatment assignment, but is not correlated with the outcome
    - Back to the random draft number example... affects the treatment assignment, but not correlated with output
  - Use only variations in treatment assignment that is explained by the instrument
  - Sacrifice goodness of fit for causal inference
  - Note drawbacks:
    - What about those who were going to go to Vietnam no matter what?
    - What about those who were *not* going to go to Vietnam no matter what?
    
### Structural Models
Models for events that have not yet occurred...
- What if these two firms merged?

1. Treatments never seen before
2. Welfare calculations
3. Answering these questions requires:
  - Agent preferences
    - Inferred using recorded preferences and making assumptions
  - Behavioral/equilibrium model for counterfactual world
4. Applications use structural equations approach
  - More complicated than potential outcomes notation
  - Typically model latent variables directly

### Experiments and Data-Mining

Concerns about an ex-post "data-mining" for heterogeneous treatment effects

__Beware of *p*-value hacking!__

## Causal Trees: CART for Causal Inference

Within a leaf, estimate treatment effect rather than a mean
  - Difference in average outcomes for treated and control group
  - Weight by inverse propensity score in observational studies
  
What is your goal? MSE of *treatment effects*

$$E_{s}T\left [ \sum _{esT} (t_{i} - \hat{t}_{i}(X_{i}))^{2} \right ]$$

### Honest causal trees

We also modify CART to be "honest."
  - Decouple model selection from model estimation
    - Split sample, one sample to build tree, second to estimate effects
  - Criteria for splitting and CV changes
    - Given set of leaves, MSE on tests set taking into account re-estimation
    - Uncertainty over estimation set and test set at time of evaluation

### Causal Forests
Adapted to Causal forests (random forest)
- Honest: two subsamples, one for tree construction, one for estimating treatment effects at leaves