In [0]:
#!pip install git+https://github.com/microsoft/dowhy.git

# DoWhy: Different estimation methods for causal inference
This is quick introduction to DoWhy causal inference library. We will load in a sample dataset and use different methods for estimating causal effect from a (pre-specified) treatment variable to a (pre-specified) outcome variable.

In [0]:
import numpy as np
import pandas as pd
import logging

import dowhy
from dowhy.do_why import CausalModel
import dowhy.datasets

A dataset from LaLonde, Robert. 1986. “Evaluating the Econometric Evaluations of Training Programs.” American Economic Review 76:604-620. 

The study looked at the effectiveness of a job training program (the treatment) on the real earnings of an individual, a couple years after completion of the program.

The data consists of a number of demographic variables (age, race, academic background, and previous real earnings), as well as a treatment indicator, and the real earnings in the year 1978 (the response).

445 observations on the following 12 variables:
* age - age in years.
* educ - years of schooling.
* black - indicator variable for blacks.
* hisp - indicator variable for Hispanics.
* married - indicator variable for martial status.
* nodegr - indicator variable for high school diploma.
* re74 real earnings in 1974.
* re75 - real earnings in 1975.
* re78 - real earnings in 1978.
* u74 - indicator variable for earnings in 1974 being zero.
* u75 - indicator variable for earnings in 1975 being zero.
* treat - an indicator variable for treatment status.

In [0]:
from rpy2.robjects import r as R

%load_ext rpy2.ipython
%R install.packages("Matching")
%R library(Matching)
%R data(lalonde)
%R -o lalonde
lalonde.to_csv("lalonde.csv",index=False)

In [0]:
# the data already loaded in the previous cell. we include the import
# here you so you don't have to keep re-downloading it.

lalonde = pd.read_csv("lalonde.csv")
lalonde.describe()

What is the naive estimate of the treatment effect?

The control group has higher earning that the treatment group - does this mean the treatment had a negative impact?
Check out income dictributions in control and treatment groups in 1978 and previous years.

Check the differences in distributions of other variables in treatment and control groups:

Build causal model to estimate the effect of the treatment on the outcome. 

In [0]:
model=CausalModel(
        data = lalonde,
        treatment="treat",
        outcome="re78",
        ...
        )

model.view_model()

Estimate causal effect using several possible techniques. Compare the results. Which one do you trust more (hint: try refuting them).
