# 7.2. Average treatment effects

In experiments in which only one treatment is applied to each individual, it will not be possible to estimate individual-level causal effects $\tau_i = y_{i1} - y_{i0}$. 

The individual causal effect can in general vary by person; hence, any definition of average causal
effect will depend on what group of people is being averaged over.

Basic types of average causal effects:

### Sample average treatment effect (SATE)

just averaging over the whole sample

$$\tau_{\text{SATE}} = \text{mean}(\text{y_if_treated}) - \text{mean}(\text{y_if_control})$$


### Conditional average treatment effect (CATE)

We can also calculate the average treatment effect for well-defined subsets such as men , women, or, for instance, 50-year-olds . These estimands are sometimes referred to as conditional
average treatment effects (CATEs) and can also take more complicated forms such as expectations
(average predictions) from linear regression models.

### Population average treatment effect (PATE)

similar to SATE, but average treatment effect for the whole population. Obviously, we wouldn't have data for the whole population, but if our study sample is a random sample of the population of interest, then any unbiased estimate of the SATE will also be an unbiased estimate of PATE.


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

In [None]:
# simulating the same data as in the practice 7.1

sample_size = 100
# age column
age = np.random.normal(40, 10, sample_size)
# converting age column to int
age = age.astype(int)
# 3 groups - [19-30, 30-45, 45-60]
# this if my random division into groups, in real research such things are done accorring to some theory/knowledge
age_groups = np.copy(age)
age_groups[age < 30] = 1
age_groups[(age >= 30)*(age < 45)] = 2
age_groups[age >= 45] = 3

data_age_groups = pd.DataFrame(age_groups, columns=['age_group'])
np.random.seed(10)
y_if_control = np.random.normal(10, 2, size=sample_size)
y_if_treated = np.copy(y_if_control)

data_age_groups['y0'] = y_if_control # y if control
data_age_groups['y1'] = y_if_treated # y if treated, but its not final version, we would add TE here

data_age_groups['y1'] = data_age_groups.apply(lambda x: x.y1 - 2 if x.age_group==1 
                                              else x.y1 + 5 if x.age_group==2 
                                              else x.y1 + 10, 
                                              axis=1)
# assigning treatment to each person with probability 0.3
z = np.random.binomial(1, 0.3, size=len(data_age_groups))
data_age_groups['z'] = z

Let's calculate SATE and CATE for our simulated data (age group data from the practice 7.1)

**as a difference in potential outcomes of treated and controlled**

In [None]:
y_t = data_age_groups[data_age_groups["z"]==1]['y1'] # y_treated
y_c = data_age_groups[data_age_groups["z"]==0]['y0'] # y_control

In [None]:
# SATE TE
te_sate = y_t.mean() - y_c.mean() 
te_sate

In [None]:
# SATE standard error
se_te_sate = np.sqrt(np.std(y_t) ** 2 / len(y_t) + np.std(y_c) ** 2 / len(y_c))
se_te_sate

In [None]:
## true SATE in ideal world
data_age_groups['y1'].mean() - data_age_groups['y0'].mean()

In [None]:
# cate with age groups
te_group1 = data_age_groups[data_age_groups["z"]==1][data_age_groups["age_group"]==1]['y1'].mean() - data_age_groups[data_age_groups["z"]==0][data_age_groups["age_group"]==1]['y0'].mean() 
te_group2 = data_age_groups[data_age_groups["z"]==1][data_age_groups["age_group"]==2]['y1'].mean() - data_age_groups[data_age_groups["z"]==0][data_age_groups["age_group"]==2]['y0'].mean() 
te_group3 = data_age_groups[data_age_groups["z"]==1][data_age_groups["age_group"]==3]['y1'].mean() - data_age_groups[data_age_groups["z"]==0][data_age_groups["age_group"]==3]['y0'].mean() 

In [None]:
te_group1, te_group2, te_group3

**with regression**

In [None]:
data_age_groups['y_real'] = data_age_groups.apply(lambda x: x.y0 if x.z == 0 else x.y1, axis=1)

In [None]:
# cate
m_sate = smf.ols(formula="y_real ~ z", 
                 data=data_age_groups).fit()
m_sate.summary()

In [None]:
# sate 1
m_sate1 = smf.ols(formula="y_real ~ z", 
                 data=data_age_groups, subset=data_age_groups['age_group']==1).fit()
m_sate1.summary()

In [None]:
# sate 2
m_sate2 = smf.ols(formula="y_real ~ z", 
                 data=data_age_groups, subset=data_age_groups['age_group']==2).fit()
m_sate2.summary()

In [None]:
# sate 3
m_sate3 = smf.ols(formula="y_real ~ z", 
                 data=data_age_groups, subset=data_age_groups['age_group']==3).fit()
m_sate3.summary()

### Randomization distribution

An unbiased estimate leads us to the right answer on average. Properties of a
statistical procedure are reflected in the distribution of the estimate over repeated samples from that
population. That is, we can envision taking an infinite number of samples from the population and for
each sample calculating the estimate. The distribution of these estimates is the **sampling distribution.**

First of all, we can simplify matters by considering all covariates
and potential outcomes to be fixed (this is a representation common both to the survey sampling
world and the randomization-based inference framework). 

Then imagine *randomly allocating observations to treatment groups again and again*. *Each new allocation will imply a different set of observed outcomes (since observed outcomes are a function of both potential outcomes and treatment assignment)*. Suppose that with each re-randomization the difference in mean outcomes
between the Í treatment and control groups is calculated. The set of these estimates represents the **randomization distribution for this estimate**. If the estimate is unbiased, then the average of all of these estimates (the mean of the randomization distribution) equals the true sample average treatment effect.

Let's create loop to create a randomization distribution for an estimate of TE in our data with age groups

In [None]:
n_trials = 1000

te_distr = []

for i in range(n_trials):
    z = np.random.binomial(1, 0.3, len(data_age_groups))
    te = np.mean(data_age_groups.loc[z==1, 'y1']) - np.mean(data_age_groups.loc[z==0, 'y0'])
    te_distr.append(te)
    
plt.hist(te_distr)

In [None]:
# sample average treatment effect and std
np.mean(te_distr), np.std(te_distr)

### Sampling distribution

We can try to estimate TE with the different method: 
+ N times randomly sample some set of our data
+ estimate average TE on each set

Then we will obtain sampling distibution that contains average TE for each draw. We can take the mean and std of this distribution

In [None]:
n_trials = 1000
n_samples = 10

te_distr = []

for i in range(n_trials):
    chosen = np.random.choice(len(data_age_groups), size=n_samples)
    data_sampled = data_age_groups.iloc[chosen].copy()
    te = np.mean(data_sampled['y1']) - np.mean(data_sampled['y0'])
    te_distr.append(te)
    
plt.hist(te_distr)

In [None]:
# sample average treatment effect and std
np.mean(te_distr), np.std(te_distr)

## Little summary: calculating average treatment effects

There are two main methods to calculate TE (sate or cate or TE for particular group):
+ as a difference between potential outcomes between groups
+ with linear regression

We can also estimate TE with sampling distribution (sample random part of data, calculate TE, repeat N times)

Randomization distribution works only if can have both y0 and y1 for each observation which is not true in real research.

In ideal world, we would have an opportunity to calculate true TE (when we have info about counterfactual outcomes). We cannot do this in real reasearch, but we can do it in simulations.

## Example of simple simulation

We demonstrate with an artificial example of a randomized experiment on 100 students designed to
test an intervention for improving final exam scores.

### Simulating a randomized experiment
We start by assigning the potential outcomes, the final exam scores that would be observed for each
student if he or she gets the control or the treatment:


In [None]:
np.random.seed(10)
n = 100
y_if_control = np.random.normal(60, 20, size=n)
y_if_treated = y_if_control + 5

In this very simple model, the intervention would add 5 points to each student’s score.
We then assign treatments (z = 0 for control or 1 for treatment), which then determine which
outcome is observed for each person:

In [None]:
z = np.random.binomial(1, 0.5, n)

y = np.array([y0  if z == 0 else y1 for (y0, y1, z) in zip(y_if_control, y_if_treated, z)])

Having simulated the data, we can now compare treated to control outcomes and compute the standard
error for the difference:

In [None]:
diff = np.mean(y[z==1]) - np.mean(y[z==0])
se_diff = np.sqrt(np.std(y[z==1]) ** 2 / len(y[z==1]) + np.std(y[z==0]) ** 2 / len(y[z==0]))

print(diff, se_diff)

Equivalently, we can run the regression:

In [None]:
data = pd.DataFrame()
data['y'] = y
data['z'] = z

In [None]:
model =  smf.ols(formula="y ~ z", 
                 data=data).fit()
model.summary()

### Including a pre-treatment predictor
Suppose we also have information about pre-test scores x. They have the same distribution as post-test scores y but with a slightly lower average.

In [None]:
data['x'] = np.random.normal(50, 20, size=n)

We can then adjust for pre-test in our regression:

In [None]:
model =  smf.ols(formula="y ~ z + x", 
                 data=data).fit()
model.summary()

Again, the coefficient of z estimates the treatment effect, and it still has a standard error of about
4, which might seem surprising: shouldn’t the inclusion of a pre-treatment predictor increase the
precision of our estimate? The answer is that, the way we constructed the pre-test variable, it wasn’t
much of a pre-treatment predictor at all, as we simulated it independently of the potential outcomes
for the final test score.


To perform a realistic simulation, we must simulate both test scores in a correlated way, which we
do here by borrowing a trick from the example of simulated midterm and final exams:

1. Each student is assumed to have a true ability drawn from a distribution with mean 50 and standard
deviation 16.
2. Each student’s score on the pre-test, x, is the sum of two components: the student’s true ability,
and a random component with mean 0 and standard deviation 12, reflecting that performance on
any given test will be unpredictable.
3. Each student’s score on the post-test, y, is his or her true ability, plus another, independent, random
component, plus an additional 10 points if a student receives the control or 15 points if he or she
receives the treatment.

In [None]:
np.random.seed(10)

n = 100
true_ability = np.random.normal(50, 16, n)
x = true_ability + np.random.normal(0, 12, n)
y_if_control = true_ability + np.random.normal(0, 12, n)
y_if_treated = y_if_control + 15

As above, we assign treatments, construct the observed outcome, and put the data into a frame:

In [None]:
z = np.random.binomial(1, 0.5, n)
y = np.array([y0  if z == 0 else y1 for (y0, y1, z) in zip(y_if_control, y_if_treated, z)])

data = pd.DataFrame()
data['y'] = y
data['z'] = z
data['x'] = x

The simple comparison is equivalent to a regression on the treatment indicator:

In [None]:
model =  smf.ols(formula="y ~ z", 
                 data=data).fit()
model.summary()

And the estimate adjusting for pre-test:

In [None]:
model =  smf.ols(formula="y ~ z + x", 
                 data=data).fit()
model.summary()

In this case, with the strong dependence between pre-test and post-test, this adjustment has reduced
the residual standard deviation by about a third.

## Task for you

send me to e-mail aspestova@hse.ru in **html** format.

(for better unserstanding, read the chapter 18 from Gelman book - https://users.aalto.fi/~ave/ROS.pdf)


1. Simulate data from the table 18.5 from Gelman book (page 348)
2. Try different treatment assignment:
+ completely random treatment assignment
+ block random experiment - 
Divide participants into 4 blocks by age: 40 years-old, 50 years-old, 60 years-old, 70-years old.
In the first two blocks (Audrey through Brad) that contain the younger
participants, the probability of receiving the supplements is 0.25 under this design, whereas in the
last two blocks, with the older participants, this probability is 0.75.)


calculations on this simulated data: 

3. Calculate SATE and CATE (for the same groups defined earlier) for both treatment assignment designs
4. Using the first treatment assignment design, fit outcome regression for several draws, with and without covariates (sex, age).
5. Describe results and the differences between the models (TE, standard errors). 