# Causal discovery with `TIGRAMITE`

TIGRAMITE is a time series analysis python module. It allows to reconstruct graphical models (conditional independence graphs) from discrete or continuously-valued time series based on the PCMCI framework and create high-quality plots of the results.

PCMCI is described here:
J. Runge, P. Nowack, M. Kretschmer, S. Flaxman, D. Sejdinovic, 
Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv. 5, eaau4996 (2019) 
https://advances.sciencemag.org/content/5/11/eaau4996

For further versions of PCMCI (e.g., PCMCI+, LPCMCI, etc.), see the corresponding tutorials.

This tutorial explains the causal assumptions and gives walk-through examples. See the following paper for theoretical background:
Runge, Jakob. 2018. “Causal Network Reconstruction from Time Series: From Theoretical Assumptions to Practical Estimation.” Chaos: An Interdisciplinary Journal of Nonlinear Science 28 (7): 075310.

Last, the following Nature Review Earth and Environment paper provides an overview of causal inference for time series in general: https://github.com/jakobrunge/tigramite/blob/master/tutorials/Runge_Causal_Inference_for_Time_Series_NREE.pdf

In [1]:
# Imports
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib notebook
## use `%matplotlib notebook` for interactive figures
# plt.style.use('ggplot')
import sklearn

import tigramite
from tigramite import data_processing as pp
from tigramite.toymodels import structural_causal_processes as toys
from tigramite import plotting as tp
from tigramite.pcmci import PCMCI

from tigramite.independence_tests.parcorr import ParCorr
from tigramite.independence_tests.gpdc import GPDC
from tigramite.independence_tests.cmiknn import CMIknn
from tigramite.independence_tests.cmisymb import CMIsymb

from tigramite.models import LinearMediation, Prediction

AttributeError: 'RcParams' object has no attribute '_get'

## Causal assumptions

Having introduced the basic functionality, we now turn to a discussion of the assumptions underlying a causal interpretation:

  - **Faithfulness / Stableness:** *Independencies in data arise not from coincidence, but rather from causal structure* or, expressed differently, *If two variables are independent given some other subset of variables, then they are not connected by a causal link in the graph*.
  
  - **Causal Sufficiency:** *Measured variables include all of the common causes.*
  
  - **Causal Markov Condition:** *All the relevant probabilistic information that can be obtained from the system is contained in its direct causes* or, expressed differently, *If two variables are not connected in the causal graph given some set of conditions (see Runge Chaos 2018 for further definitions), then they are conditionally independent*.
  
  - **No contemporaneous effects:** *There are no causal effects at lag zero.*
  
  - **Stationarity**
  
  - **Parametric assumptions of independence tests** (these were already discussed in basic tutorial)

### Faithfulness

Faithfulness, as stated above, is an expression of the assumption that the independencies we measure come from the causal structure, i.e., the time series graph, and cannot occur due to some fine tuning of the parameters. Another unfaithful case are processes containing *purely* deterministic dependencies, i.e., $Y=f(X)$, without any noise. We illustrate these cases in the following.

#### Fine tuning

Suppose in our model we have two ways in which $X^0$ causes $X^2$, a direct one, and an indirect effect $X^0\to X^1 \to X^2$ as realized in the following model:

\begin{align*}
    X^0_t &= \eta^0_t\\
    X^1_t &= 0.6 X^0_{t-1} + \eta^1_t\\
    X^2_t &= 0.6 X^1_{t-1} - 0.36 X^0_{t-2} + \eta^2_t\\
\end{align*}

In [None]:
seed=1
random_state = np.random.default_rng(seed=seed)
data = random_state.standard_normal((500, 3))
for t in range(1, 500):
#     data[t, 0] += 0.6*data[t-1, 1]
    data[t, 1] += 0.6*data[t-1, 0]
    data[t, 2] += 0.6*data[t-1, 1] - 0.36*data[t-2, 0]
    
var_names = [r'$X^0$', r'$X^1$', r'$X^2$']
dataframe = pp.DataFrame(data, var_names=var_names)
# tp.plot_timeseries(dataframe)

Since here $X^2_t = 0.6 X^1_{t-1} - 0.36 X^0_{t-2} + \eta^2_t = 0.6 (0.6 X^0_{t-2} + \eta^1_{t-1}) - 0.36 X^0_{t-2} + \eta^2_t = 0.36 X^0_{t-2} - 0.36 X^0_{t-2} + ...$, there is no unconditional dependency $X^0_{t-2} \to X^2_t$ and the link is not detected in the condition-selection step: 

In [None]:
parcorr = ParCorr()
pcmci_parcorr = PCMCI(
    dataframe=dataframe, 
    cond_ind_test=parcorr,
    verbosity=1)
all_parents = pcmci_parcorr.run_pc_stable(tau_max=2, pc_alpha=0.05)

However, since the other parent of $X^2$, namely $X^1_{t-1}$ *is* detected, the MCI step conditions on $X^1_{t-1}$ and can reveal the true underlying graph (in this particular case):

In [None]:
results = pcmci_parcorr.run_pcmci(tau_max=2, pc_alpha=0.05, alpha_level = 0.01)

tp.plot_graph(
        val_matrix=results['val_matrix'],
        graph=results['graph'],
        var_names=var_names,
        link_colorbar_label='cross-MCI',
        node_colorbar_label='auto-MCI',
        ); plt.show()

tp.plot_time_series_graph(
    val_matrix=results['val_matrix'],
    graph=results['graph'],
    var_names=var_names,
    link_colorbar_label='MCI',
    ); plt.show()

Note, however, that this is not always the case and such cancellation, even though a pathological case, can present a problem especially for smaller sample sizes.