In [1]:
import pandas as pd
import numpy as np
import a2functions as f

## 1. Simulate a DGP where the outcome of interest depends on a randomly assigned treatment and some observed covariates. How does your estimate of the treatment effect parameter compare in the following two cases

### a. You do not control for any covariates

The DGP can be found in detail from the following code, the DAG here is w -> y, x -> y,
and we present the Monte Carlo experiment with sample sizes N=100 and N=1000, the bias, RMSE, standard error and size of the treatment effect estimate.

In [2]:
df=f.generate_dataset1a(100)
df.to_csv('data1a100.csv', index=False)

In [3]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.4043630531275282,
 'bias': -0.09563694687247182,
 'rmse': 0.3092522382659046,
 'standard_error': 0.4350247762252514}

In [4]:
df=f.generate_dataset1a(1000)
df.to_csv('data1a1000.csv', index=False)

In [5]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.5630011202928156,
 'bias': 0.06300112029281557,
 'rmse': 0.2510002396270083,
 'standard_error': 0.13048308815694237}

### b. You control for all the covariates that affect the outcome

The DGP can be found in detail from the following code, the DAG here is w -> y, x -> y, the covariate now is constant, 
and we present the Monte Carlo experiment with sample sizes N=100 and N=1000, the bias, RMSE, standard error and size of the treatment effect estimate.

In [6]:
df=f.generate_dataset1b(100)
df.to_csv('data1b100.csv', index=False)

In [7]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.429233488582656,
 'bias': -0.07076651141734391,
 'rmse': 0.26601975756951574,
 'standard_error': 0.06206542761168939}

In [8]:
df=f.generate_dataset1b(1000)
df.to_csv('data1b1000.csv', index=False)

In [9]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.5060925828626495,
 'bias': 0.006092582862649465,
 'rmse': 0.07805499896002475,
 'standard_error': 0.020740034234520036}

## 2. Simulate a DGP with a confounder (common cause)

### a. You fail to control for the confounder

The DGP can be found in detail from the following code, the DAG here is w -> y, x -> y, u -> w, u -> y, 
and we present the Monte Carlo experiment with sample sizes N=100 and N=1000, the bias, RMSE, standard error and size of the treatment effect estimate.

In [10]:
df=f.generate_dataset2a(100)
df.to_csv('data2a100.csv', index=False)

In [11]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.6247020584958918,
 'bias': 0.12470205849589178,
 'rmse': 0.35313178630065545,
 'standard_error': 0.3988645857019625}

In [12]:
df=f.generate_dataset2a(1000)
df.to_csv('data2a1000.csv', index=False)

In [13]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.6825222554555994,
 'bias': 0.1825222554555994,
 'rmse': 0.4272262345123476,
 'standard_error': 0.12938811393774372}

### b. You do control for the confounder

The DGP can be found in detail from the following code, the DAG here is w -> y, x -> y, u -> w, u -> y, now the value of confounder is constant, 
and we present the Monte Carlo experiment with sample sizes N=100 and N=1000,  the bias, RMSE, standard error and size of the treatment effect estimate.

In [14]:
df=f.generate_dataset2b(100)
df.to_csv('data2b100.csv', index=False)

In [15]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 2.13867186851519,
 'bias': 0.63867186851519,
 'rmse': 0.7991694867268081,
 'standard_error': 0.3752970850327385}

In [16]:
df=f.generate_dataset2b(1000)
df.to_csv('data2b1000.csv', index=False)

In [17]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.4042332995029554,
 'bias': -0.09576670049704461,
 'rmse': 0.3094619532301905,
 'standard_error': 0.13195352474570415}

## 3. Simulate a DGP with selection bias into the treatment (variable in between the path from the treatment to the outcome)

### a. You control for the variable in between the path from cause to effect

The DGP can be found in detail from the following code, the DAG here is w -> y, x -> y, s -> y, s stands for the selection bias,
and we present the Monte Carlo experiment with sample sizes N=100 and N=1000, the bias, RMSE, standard error and size of the treatment effect estimate.

In [18]:
df=f.generate_dataset3a(100)
df.to_csv('data3a100.csv', index=False)

In [19]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.4850168904119831,
 'bias': -0.014983109588016852,
 'rmse': 0.12240551289879412,
 'standard_error': 0.3957673482024291}

In [20]:
df=f.generate_dataset3a(1000)
df.to_csv('data3a1000.csv', index=False)

In [21]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.5740552883909533,
 'bias': 0.07405528839095332,
 'rmse': 0.2721310132839573,
 'standard_error': 0.12559009924291892}

### b. You do not control for the variable in between the path from cause to effect

The DGP can be found in detail from the following code, the DAG here is w -> y, x -> y, s -> y, 
and we present the Monte Carlo experiment with sample sizes N=100 and N=1000, the bias, RMSE, standard error and size of the treatment effect estimate.

In [22]:
df=f.generate_dataset3b(100)
df.to_csv('data3b100.csv', index=False)

In [23]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.8901058894553024,
 'bias': 0.39010588945530245,
 'rmse': 0.624584573500901,
 'standard_error': 0.4081273792619597}

In [24]:
df=f.generate_dataset3b(1000)
df.to_csv('data3b1000.csv', index=False)

In [25]:
f.estimate_treatmenteffect(df)

{'estimated_effect': 1.8227482254888616,
 'bias': 0.3227482254888616,
 'rmse': 0.5681093428987607,
 'standard_error': 0.1286393166794203}

## 4. Simulate a DGP where the outcome variable is overrepresented at 0.

The DGP can be found in detail from the following code, the DAG here is w -> y, x -> y, this time most outcome are untreated, 
and we present the Monte Carlo experiment with sample sizes N=100 and N=1000, the bias, RMSE, standard error and size of the treatment effect estimate.

In [26]:
df4100=f.generate_dataset4(100)
df4100.to_csv('data4100.csv', index=False)

In [27]:
df41000=f.generate_dataset4(1000)
df41000.to_csv('data41000.csv', index=False)

### a. You estimate the treatment effect parameter using the Conditional-on-Positives (COP) framework

In [28]:
f.estimate_treatmenteffect_cop(df4100)

{'estimated_effect': 0.6100972567533738,
 'bias': -0.8899027432466262,
 'rmse': 0.9433465658211865,
 'standard_error': 0.23699021425762756}

In [29]:
f.estimate_treatmenteffect_cop(df41000)

{'estimated_effect': 0.7633602652720606,
 'bias': -0.7366397347279394,
 'rmse': 0.8582771899147381,
 'standard_error': 0.08924465002790108}

### b. You estimate the treatment effect using the conventional method of comparing the outcomes of treated and untreated individuals.

In [30]:
f.estimate_treatmenteffect(df4100)

{'estimated_effect': 1.2551121008406538,
 'bias': -0.24488789915934617,
 'rmse': 0.49486149492493975,
 'standard_error': 0.2621350631721448}

In [31]:
f.estimate_treatmenteffect(df41000)

{'estimated_effect': 1.411413961295402,
 'bias': -0.08858603870459802,
 'rmse': 0.29763406845419765,
 'standard_error': 0.10154767436562655}