# Motivating example: Figure 4

In [2]:
!pip install git+https://github.com/y0-causal-inference/eliater.git@linear-regression

Collecting git+https://github.com/y0-causal-inference/eliater.git@linear-regression
  Cloning https://github.com/y0-causal-inference/eliater.git (to revision linear-regression) to c:\users\pnava\appdata\local\temp\pip-req-build-f3onojbm
  Resolved https://github.com/y0-causal-inference/eliater.git to commit f666788b42cf32722a3bef39754a9bb19a375a92
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'


  Running command git clone --filter=blob:none --quiet https://github.com/y0-causal-inference/eliater.git 'C:\Users\pnava\AppData\Local\Temp\pip-req-build-f3onojbm'
  Running command git checkout -b linear-regression --track origin/linear-regression
  branch 'linear-regression' set up to track 'origin/linear-regression'.
  Switched to a new branch 'linear-regression'

[notice] A new release of pip is available: 23.3.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


This is the motivating example in Figure 4 (a) in this paper: *Eliater: an analytical workflow and open source implementation for causal query
estimation in biomolecular networks*. 

In [3]:
from eliater.examples.frontdoor_backdoor_discrete import single_mediator_with_multiple_confounders_nuisances_discrete_example

In [9]:
graph = single_mediator_with_multiple_confounders_nuisances_discrete_example.graph

In [10]:
data = single_mediator_with_multiple_confounders_nuisances_discrete_example.generate_data(num_samples=500, seed=1)

In [11]:
data.head()

Unnamed: 0,X,M1,Z1,Z2,Z3,R1,R2,R3,Y
0,1,1,1,1,1,1,1,1,1
1,0,1,1,1,0,1,0,1,1
2,1,1,0,1,1,1,1,1,1
3,1,1,1,1,1,1,1,0,1
4,1,1,0,1,1,0,0,1,1


## Step 1: Verify correctness of the network structure

In [12]:
from eliater.network_validation import print_graph_falsifications

In [13]:
print_graph_falsifications(graph, data, method="chi-square", verbose=True, significance_level=0.01)

Failed tests: 0/26 (0.00%)
Reject null hypothesis when p<0.01
left    right    given         stats         p    dof    p_adj  p_adj_significant
M1      R2       R1       3.49737     0.174002      2        1  False
M1      Z3       X        0           1             2        1  False
R1      Z2       X        0           1             2        1  False
R1      R3       R2|Y     0.673461    0.954561      4        1  False
R1      Z3       X        0.320228    0.852047      2        1  False
X       Z2       Z1       0.00364509  0.998179      2        1  False
R2      Y        R1       0           1             2        1  False
X       Z3       Z1       1.8973      0.387263      2        1  False
M1      R3       R1|Y     1.95331     0.582155      3        1  False
Y       Z1       X|Z3     0.443173    0.978792      4        1  False
R1      X        M1       0.335248    0.845672      2        1  False
Y       Z2       X|Z3     0.869422    0.928906      4        1  False
R3      Z1      

All the d-separations implied by the network are validated by the data. No test failed. Hence, we can proceed to step 2.

## Step 2: Check query identifiability

In [14]:
from y0.algorithm.identify import Identification
from y0.dsl import P, Variable
id_in = Identification.from_expression(
    query=P(Variable('Y') @ Variable('X')),
    graph=graph,
)
id_in

Identification(outcomes="{Y}, treatments="{X}",conditions="set()",  graph="NxMixedGraph(directed=<networkx.classes.digraph.DiGraph object at 0x000001D6A992B910>, undirected=<networkx.classes.graph.Graph object at 0x000001D6A9928810>)", estimand="P(M1, R1, R2, R3, X, Y, Z1, Z2, Z3)")

The query is identifiable. Hence we can proceed to step 3.

## Step 3: Find nuisance variables and mark them as latent

In [15]:
from eliater.discover_latent_nodes import find_nuisance_variables, mark_nuisance_variables_as_latent

This function finds the nuisance variables for the input graph.

In [16]:
nuisance_variables = find_nuisance_variables(graph, treatments=Variable("X"), outcomes=Variable("Y"))
nuisance_variables

{R1, R2, R3}

The nuisance variables are $R_1$, $R_2$, and $R_3$.

## Step 4: Simplify the network

The following function find the nuisance variable (step 3), marks them as latent and then applies Evan's simplification rules to remove the nuisance variables. The new graph does not contain nuisance variables.

In [17]:
from eliater.discover_latent_nodes import remove_nuisance_variables

In [18]:
new_graph = remove_nuisance_variables(graph, treatments=Variable("X"), outcomes=Variable("Y"))

## Step 5: Estimate the query

In [19]:
from y0.algorithm.estimation import estimate_ace

In [21]:
ATE_value = estimate_ace(graph=new_graph,
                         treatments=Variable("X"),
                         outcomes=Variable("Y"),
                         data=data)
ATE_value

-0.6949811379120795