In [7]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import statsmodels.formula.api as smf
import graphviz as gr

In [4]:
data = pd.read_csv("online_classroom.csv").query("format_blended==0")
result = smf.ols('falsexam ~ format_ol', data=data).fit()
result.summary().tables[1]

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,78.5475,1.113,70.563,0.000,76.353,80.742
format_ol,-4.9122,1.680,-2.925,0.004,-8.223,-1.601


To see the effect of confounder, suppose a model for how education affects wage:
$
Wage_i = \alpha + \kappa \ Educ_i + A_i'\beta + u_i
$

wage is affected by education, which is measured by the size of  𝜅  and by additional ability factors, denoted as the vector  𝐴
$
\dfrac{Cov(Wage_i, Educ_i)}{Var(Educ_i)} = \kappa + \beta'\delta_{Ability}
$

where  𝛿𝐴  is the vector of coefficients from the regression of  𝐴  on  𝐸𝑑𝑢𝑐

the bias term will be zero if the omitted variables have no impact on the dependent variable  𝑌, and the bias term will also be zero if the omitted variables have no impact on the treatment variable.


Confounder is a variable that causes both the treatment and the outcome. In the wage case for example, IQ is a confounder. People with high IQ tend to finish a higher level of education which takes more years because it's easier for them, so IQ causes education. People with high IQ also tend to be naturally more productive and consequently have higher wages, so IQ also causes wage. Since confounders are variables that affect both the treatment and the outcome, there is an arrow going to T and Y in the DAG. 

In [None]:
g = gr.Digraph()

g.edge("W", "T"), g.edge("W", "Y"), g.edge("T", "Y")

g.edge("IQ", "Educ", color="red"), g.edge("IQ", "Wage", color="red"), g.edge("Educ", "Wage", color="red")

g.edge("Crime", "Police", color="red"), g.edge("Crime", "Violence", color="red"), 
g.edge("Police", "Violence", color="blue")

g

From the graph above, we can see that IQ causes wage and it also causes education: high IQ causes both more education and wage. If IQ is not taken into acccount in the model, some of its effect on wage will flow through the correlation with education. That will make the impact of education look higher than it actually is. This is an example of positive bias derived from confounder. 