This notebook demonstrates how a DAG can be used in combination with the assignment of a distribution to each node and a weight to each edge to create a linear SCM.

In [1]:
import sys
import time
from functools import partial

import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from numpy.random import normal, uniform

from y0.algorithm.estimation.linear_scm import get_single_door
from y0.dsl import Z1, Z2, X, Y, Z
from y0.examples import backdoor_example, frontdoor_example, napkin_example
from y0.simulation import example_generators, example_graph, get_fits_df, simulate

In [2]:
print(sys.version)

In [3]:
print(time.asctime())

In [4]:
from matplotlib_inline.backend_inline import set_matplotlib_formats

set_matplotlib_formats("svg")

In [5]:
np.random.seed(42)

In [6]:
example_graph.draw(prog="neato")

In [7]:
example_graph.to_linear_scm_latex()

## Simulating Data

It's possible to simulate data using a linear structural causal model (SCM) given the following:

1. A directed acyclic graph (DAG) structure
2. A scalar weight for each edge in the DAG
3. A probability distribution for each node

In [8]:
df, fits = simulate(example_graph, generators=example_generators, trials=5000)

Ultimately the simulation provides _trials_ number of data points:

In [9]:
df

The simulation also performs several calculations over each pair of variables:

1. What's the correlation coefficient ($r^2$) between the variables?
2. Are the variables d-separated (i.e., conditionally independent)?

In [10]:
param_df = get_fits_df(fits)
param_df

The following plot demonstrates that variables that are d-separated (i.e., conditionally independent) have effectively no correlation.

In [11]:
sns.swarmplot(data=param_df, y="d_separated", x="r2", orient="h")
plt.show()

In the following, we estimate parameter values for a linear SCM using backdoor adjustment

In [12]:
edge_parameters = get_single_door(example_graph, df)
edge_parameters

## Backdoor example

In [13]:
backdoor_example.graph.draw()

In [14]:
backdoor_example.graph.to_linear_scm_latex()

In [15]:
backdoor_generators = {
    X: partial(uniform, low=-1.0, high=1.0),
    Y: partial(uniform, low=-2.0, high=2.0),
    Z: partial(normal, loc=0.0, scale=1.0),
}

In [16]:
backdoor_df, backdoor_fits = simulate(
    backdoor_example.graph, generators=backdoor_generators, trials=5000
)
backdoor_df

In [17]:
get_fits_df(backdoor_fits)

In [18]:
backdoor_parameters = get_single_door(backdoor_example.graph, backdoor_df)
backdoor_parameters

## Frontdoor example

In [19]:
frontdoor_example.graph.draw()

In [20]:
frontdoor_example.graph.to_linear_scm_latex()

In [21]:
frontdoor_generators = {
    X: partial(uniform, low=-1.0, high=1.0),
    Y: partial(uniform, low=-2.0, high=2.0),
    Z: partial(normal, loc=0.0, scale=1.0),
}

In [22]:
frontdoor_df, frontdoor_fits = simulate(
    frontdoor_example.graph, generators=frontdoor_generators, trials=5000
)
frontdoor_df

In [23]:
get_fits_df(frontdoor_fits)

In [24]:
frontdoor_parameters = get_single_door(frontdoor_example.graph, frontdoor_df)
frontdoor_parameters

## Napkin example

In [25]:
napkin_example.graph.draw()

In [26]:
napkin_example.graph.to_linear_scm_latex()

In [27]:
napkin_generators = {
    X: partial(uniform, low=-1.0, high=1.0),
    Y: partial(uniform, low=-2.0, high=2.0),
    Z1: partial(normal, loc=0.0, scale=1.0),
    Z2: partial(normal, loc=0.0, scale=1.0),
}

In [28]:
napkin_df, napkin_fits = simulate(napkin_example.graph, generators=napkin_generators, trials=5000)
napkin_df

In [29]:
get_fits_df(napkin_fits)

In [30]:
napkin_parameters = get_single_door(napkin_example.graph, napkin_df)
napkin_parameters