# Basic Example for generating samples from a GCM

A graphical causal model (GCM) describes the data generation process of the modeled variables. Therefore, after we fit
a GCM, we can also generate completely new samples from it. This can be done by sorting the nodes in topological
order, randomly sample from root-nodes and then propagate the data through the graph by evaluating the downstream
causal mechanisms with randomly sampled noise. The ``dowhy.gcm`` package provides a simple helper function that does
this automatically and, by this, offers a simple API to draw samples from a GCM.

Lets take a look at the following example:

In [None]:
import numpy as np, pandas as pd

X = np.random.normal(loc=0, scale=1, size=1000)
Y = 2 * X + np.random.normal(loc=0, scale=1, size=1000)
Z = 3 * Y + np.random.normal(loc=0, scale=1, size=1000)
data = pd.DataFrame(data=dict(X=X, Y=Y, Z=Z))
data.head()

Similar as in the introduction, we generate data for the simple linear DAG X→Y→Z. Lets define the GCM and fit it to the
data:

In [None]:
import networkx as nx
import dowhy.gcm as gcm

causal_model = gcm.StructuralCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')]))
gcm.auto.assign_causal_mechanisms(causal_model, data) # Automatically assigns additive noise models to non-root nodes
gcm.fit(causal_model, data)

We now learned the generative models of the variables, based on the defined causal graph and the additive noise model assumption.
To generate new samples from this model, we can now simply call:

In [None]:
generated_data = gcm.draw_samples(causal_model, num_samples=1000)
generated_data.head()

If our modeling assumptions are correct, the generated data should now resemble the observed data distribution. A quick
way to make sure of this is to estimate the KL-divergence between observed and generated distribution:

In [None]:
gcm.divergence.auto_estimate_kl_divergence(data.to_numpy(), generated_data.to_numpy())

Here, we expect the divergence to be (very) small.

**Note**: We **cannot** validate the correctness of a causal graph this way,
since any graph from a Markov equivalence class would be sufficient to generate data that is consistent with the observed one,
but only one particular graph would generate the correct interventional and counterfactual distributions. This is, seeing the example above,
X→Y→Z and X←Y←Z would generate the same observational distribution (since they encode the same conditionals), but only X→Y→Z would generate the
correct interventional distribution (e.g. when intervening on Y).