# Causal inferene - An overview and some tools

This is an overview of some talks, books, blogposts and packages that are useful if you are interested in causal inference

Hans Olav Melberg,
University of Oslo

## Texts
-Note: Some of these references are relatively accessible, but some are difficult. I include them for future references for those who will continue to do work in this area
- [If correlation doesn't imply causation then what does?](
http://www.michaelnielsen.org/ddi/if-correlation-doesnt-imply-causation-then-what-does/) - Michael Nielsen
- [Causal Inference - Slides from a presentation](http://seanjtaylor.github.io/CausalInference/#/) - Sean J. Taylor
- [Notebooks on causality and methods](https://ericmjl.github.io/causality/) - ericmjl
- [Notes on causal inference (Notebooks)](https://github.com/ijmbarr/notes-on-causal-inference) - ijmbarr
- [Causal inference with python part 1 potential outcomes](http://www.degeneratestate.org/posts/2018/Mar/24/causal-inference-with-python-part-1-potential-outcomes/) - Iain
- [Advanced data analysis from an elementary point of view](http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/) - Cosma Rohilla Shalizi
- [Causal inference book/](https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/) - Miguel Hernan
- [Speaker_slides: Causality](http://mlss.tuebingen.mpg.de/2017/speaker_slides/Causality.pdf) - Dominik Janzing & Bernhard Schölkopf

## Videos
- [Causal inference tutorial](https://mediasite.kellogg.northwestern.edu/Mediasite/Play/8e78dc83c6fb4d20abeeb18028a8f7071d?catalog=1533bdef-0c88-4513-ad97-5fce50c92e62) ([copy of slides and code](https://github.com/amit-sharma/causal-inference-tutorial/)) - Amit Sharma


## Packages
- [doWhy](https://causalinference.gitlab.io/dowhy/) - Amit Sharma
- [Causal inference in python](http://causalinferenceinpython.org/) - Lawrence Wong
- [Causality](https://github.com/akelleh/causality) - ahelleh


# Example: Using Lawrence Wong's package

In [2]:
#install the package (only need to do this once)
!pip install causalinference --user

Collecting causalinference
[31mmkl-random 1.0.1 requires cython, which is not installed.[0m
Installing collected packages: causalinference
Successfully installed causalinference-0.1.2
[33mYou are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [2]:
import pandas as pd

In [3]:
# read data
df = pd.read_csv(r"C:/Users/hmelberg_adm/Documents/GitHub/causal/data/rhc.csv")

In [8]:
df.columns

Index(['Unnamed: 0', 'cat1', 'cat2', 'ca', 'sadmdte', 'dschdte', 'dthdte',
       'lstctdte', 'death', 'cardiohx', 'chfhx', 'dementhx', 'psychhx',
       'chrpulhx', 'renalhx', 'liverhx', 'gibledhx', 'malighx', 'immunhx',
       'transhx', 'amihx', 'age', 'sex', 'edu', 'surv2md1', 'das2d3pc',
       't3d30', 'dth30', 'aps1', 'scoma1', 'meanbp1', 'wblc1', 'hrt1', 'resp1',
       'temp1', 'pafi1', 'alb1', 'hema1', 'bili1', 'crea1', 'sod1', 'pot1',
       'paco21', 'ph1', 'swang1', 'wtkilo1', 'dnr1', 'ninsclas', 'resp',
       'card', 'neuro', 'gastr', 'renal', 'meta', 'hema', 'seps', 'trauma',
       'ortho', 'adld3p', 'urin1', 'race', 'income', 'ptid'],
      dtype='object')

In [10]:
# fix variables
df = df.replace({'Yes': 1, 'No':0})

In [None]:
# fix variables
df.swang1 = df.swang1.replace({'No RHC':0, 'RHC':1})

In [29]:
df['female'] = df.sex.replace({'Male':0, 'Female':1})

In [38]:
# define variables

y = df.dth30.values # outcome is death within 30 days
d = df.swang1.values # treatment variable 
x = df[['age', 'female']].values # variables to adjust for


In [31]:
# import the package for causal inference
from causalinference import CausalModel
from causalinference.utils import random_data
# for more details, see: https://github.com/laurencium/causalinference/blob/master/docs/tex/vignette.pdf

In [39]:
# input the data to the model
causal = CausalModel(y, d, x)

In [42]:
# get an overview of the data 
# Note it gives the pre-treatmetn balance on the covariates
print(causal.summary_stats)


Summary Statistics

                      Controls (N_c=3551)        Treated (N_t=2184)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y        0.306        0.461        0.380        0.486        0.074

                      Controls (N_c=3551)        Treated (N_t=2184)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0       61.761       17.288       60.750       15.631       -0.061
             X1        0.461        0.499        0.415        0.493       -0.093



In [45]:
# do an ordinary regression
causal.est_via_ols()

  olscoef = np.linalg.lstsq(Z, Y)[0]


In [46]:
# get the results
print(causal.estimates)


Treatment Effect Estimates: OLS

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      0.077      0.013      5.957      0.000      0.052      0.102
           ATC      0.078      0.013      6.005      0.000      0.052      0.103
           ATT      0.075      0.013      5.854      0.000      0.050      0.101



In [47]:
# estimate the propensity scores
causal.est_propensity_s()

In [49]:
print(causal.propensity)


Estimated Parameters of Propensity Score

                    Coef.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
     Intercept     -2.309      0.306     -7.555      0.000     -2.909     -1.710
            X1      0.480      0.215      2.237      0.025      0.060      0.901
            X0      0.071      0.011      6.739      0.000      0.050      0.091
         X0*X0     -0.001      0.000     -6.762      0.000     -0.001     -0.000
         X1*X0     -0.011      0.003     -3.097      0.002     -0.017     -0.004



In [51]:
# drop some observations (automatic)
causal.trim_s()

0.1478055598584242

In [52]:
print(causal.summary_stats)


Summary Statistics

                      Controls (N_c=3549)        Treated (N_t=2184)             
       Variable         Mean         S.d.         Mean         S.d.     Raw-diff
--------------------------------------------------------------------------------
              Y        0.306        0.461        0.380        0.486        0.074

                      Controls (N_c=3549)        Treated (N_t=2184)             
       Variable         Mean         S.d.         Mean         S.d.     Nor-diff
--------------------------------------------------------------------------------
             X0       61.739       17.267       60.750       15.631       -0.060
             X1        0.461        0.499        0.415        0.493       -0.093



In [54]:
# create groups
causal.stratify_s()
print(causal.strata)


Stratification Summary

              Propensity Score         Sample Size     Ave. Propensity   Outcome
   Stratum      Min.      Max.  Controls   Treated  Controls   Treated  Raw-diff
--------------------------------------------------------------------------------
         1     0.148     0.205        39         7     0.185     0.199     0.278
         2     0.205     0.227        38         7     0.217     0.217     0.425
         3     0.227     0.249        70        20     0.239     0.240     0.386
         4     0.250     0.275       126        53     0.263     0.262     0.094
         5     0.276     0.310       249       109     0.294     0.297     0.023
         6     0.310     0.354       488       228     0.335     0.334     0.074
         7     0.354     0.398       879       554     0.379     0.379     0.124
         8     0.398     0.439      1660      1206     0.422     0.422     0.052



  return (mean_t-mean_c) / np.sqrt((sd_c**2+sd_t**2)/2)


In [55]:
causal.est_via_ols()
causal.est_via_matching()
print(causal.estimates)

  olscoef = np.linalg.lstsq(Z, Y)[0]



Treatment Effect Estimates: OLS

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      0.077      0.013      5.959      0.000      0.052      0.102
           ATC      0.078      0.013      6.008      0.000      0.052      0.103
           ATT      0.075      0.013      5.857      0.000      0.050      0.101

Treatment Effect Estimates: Matching

                     Est.       S.e.          z      P>|z|      [95% Conf. int.]
--------------------------------------------------------------------------------
           ATE      0.089      0.021      4.234      0.000      0.048      0.130
           ATC      0.087      0.023      3.734      0.000      0.042      0.133
           ATT      0.091      0.024      3.834      0.000      0.044      0.138



# Same example, using another package: pymatch

In [7]:
!pip install pymatch --user
# more info here: https://github.com/benmiroglio/pymatch

[31mmkl-random 1.0.1 requires cython, which is not installed.[0m
[33mYou are using pip version 10.0.1, however version 18.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [8]:
from pymatch.Matcher import Matcher

ModuleNotFoundError: No module named 'pymatch'