Assumptions for causal inference

* Causal Markov Condition
* Faithfulness
* No causal cycles
* Causal Sufficiency
* No selection bias
* Time series:
    * Time order
    * Stationarity
    * Instantaneous effects
* Assumptions on dependency types (linearity etc) and distributions ( -> Structural Causal Models)
* Causal approaches typically utilize a subset of these assumptions

Selection bias

* X = 0, 1 treatment in randomized trial
* Y = 0, 1 patient recovery
* Suppose there is no effect of X on Y
* But X = 1 has unpleasant side-effects Z
* Severity of Z is influenced by unobserved general health level L: the sicker, the worse
* L is the only cause of patient recovery
* Patients suffering too strong side-effects drop out of study (S = 0)
* The data only contains patients that stayed in the study (S = 1 is conditioned on)
* Now X and Y are correlated even though there is no effect and conditioning on side-effects doesn't help

* DAGs assume no unobserved and selection variables, how to include these?
* DAGs including unobserved variables would have infinite nodes, can either constrain graph and topology
* ... or represent such relationships with ancestral graph models
* For any DAG including latent and selection variables there is a unique maximal ancestral graph (MAG)
* MAG over the observed variables alone represents the conditional independences relations entailed by the original DAG

* Mixed graph has 3 kinds of edges: directed (->), bidirected (<->) and undirected (-)
* Kinships: parent -> child, spouse <-> spouse, neighbor - neighbor
* Directed cycle occurs when $X_i \to X_j$ and $X_j$ is ancestor of $X_i$
* Almost directed cycle occurs when $X_i \leftrightarrow X_j$ and $X_j$ is ancestor of $X_i$
* $X_k$ is collider if $X_i \to X_k \leftarrow X_j$

* Ancestral graph is a mixed graph where
    1. no directed cycle
    2. no almost directed cycle
    3. for any undirected edge $X_i - X_j$, $X_i$ and $X_j$ have no parents or spouses

* A path p between $X_i$ and $X_j$ is active or m-connecting relative to (possibly empty) set Z (with $X_i, X_j \notin Z$) iff
    1. every non-collider on p is not a member of Z
    2. every collider on p is in Z or has a descendant in Z
* $X_i$ and $X_j$ are said to be m-separated by $Z$ if there is no active path between them
* Ancestral garph is maximal if for any two non-adjacent vertices, there is a set of vertices that m-separates them
* A path p between $X_i$ and $X_j$ on which every collider is an ancestor of $\{X_i, X_j\} \cup S$ and every non-collider in $L$ is called an inducing path relative to $\{L, S\}$. Edges are trivially inducing paths.

* DAGs are special cases of MAGs
* Markov property: If $X_i$ and $X_j$ are m-separated by Z, then $X_i$ and $X_j$ are probabilistically independent conditional on Z.
* $X_i$ and $X_j$ are m-connected given $Z \cup S$ for any $Z \subseteq X \backslash \{X_i, X_j\}$ iff there is an inducing path between $X_i$ and $X_j$ relative to $\{L, S\}$
* An ancestral graph is maximal iff there is no inducing path relative to $\{\}$ between any two non-adjacent vertices in the graph


1. $X_i, X_j \in X$ are adjacent in MAG iff there is a inducing path relative to $\{L, S\}$
2. Orient edges as follows:
    * $X_i \to X_j$ if $X_i \in An_G(X_j \cup S)$ and $X_j \notin An_G(X_i \cup S)$
    * $X_i \leftrightarrow X_j$ if $X_i \notin An_G(X_j \cup S)$ and $X_j \notin An_G(X_i \cup S)$
    * $X_i - X_j$ if $X_i \in An_G(X_j \cup S)$ and $X_j \in An_G(X_i \cup S)$

Then MAG probabilistically represents DAG: for any disjoint $X_i, X_j, Z \subseteq X$, $X_i, X_j$ are entailed to be independent conditional on $Z \cup S$ in the DAG if $X_i$ and $X_j$ are entailed to be independent conditional on Z by the MAG

* X -> Y means that X is a cause of Y or of some selection variable, but Y is not a cause of X or of any selection variable
* X <-> Y means that X is not a cause of Y or of any selection variable, and Y is not a cause of X or of any selection variable
* X - Y means that X is a cause of Y or of some selection variable, and Y is a cause of X or of some selection variable 

* Two MAGs $M_1$ and $M_2$ with the same set of vertices are Markov equivalent, if for any three disjoint $X, Y, Z$, $X$ and $Y$ are m-separated by $Z$ in $M_1$ if and only if $X$ and $Y$ are m-separated by $Z$ in $M_2$
* $[M]$ denotes Markov equivalence class; a mark in $M$ is invariant if the mark is the same in all members of $[M]$
* Through conditional independence testing we hope to discover adjacencies and invariant marks of the unknown causal MAG
* Assuming Causal Markov and Faithfulness for the underlying DAG (including latents and selections) leads to correspondence between observable conditional independence relations and m-separation relations in the causal MAG

* A partial ancestral graph (PAG) is a graph s.t. 
    1. P has the same adjacencies as $[M]$
    2. Every non-circle mark is an invariant in $[M]$
    3. If every circle in $P$ is a variant mark in $[M]$ then P is called maximally informative for $[M]$



The fast causal discovery algorithm is a sound and complete algorithm that outputs a maximally informative PAG assuming the Causal Markov condition and Faithfulness

1. Skeleton discovery (as for PC algorithm) to find initial skeleton and separating sets
2. Orienting colliders (as for PC algorithm)
3. Further update skeleton
4. Further orient colliders
5. Exhaustively apply 10 orientation rules

Observed dependence: Possible causal models? 

Causal Markov Condition: statistical dependence => causal connectedness

Now add Z and assume no unobserved variables

Observed conditional independence: $X \perp\!\!\!\perp Y | Z \iff p(X, Y | Z) = p(X | Z)p(Y | Z)$ 

Faithfulness assumption: statistical independence => no causal connectedness

Markov equivalence: Cannot distinguish graphs

Suppose $V \perp\!\!\!\perp W, X \perp\!\!\!\perp Y | Z, X \perp\!\!\!\perp V | Z, X \perp\!\!\!\perp W | Z, Y \perp\!\!\!\perp V | Z, Y\perp\!\!\!\perp W | Z$ but all others are dependent $X \not\perp\!\!\!\perp Y, X \not\perp\!\!\!\perp W, Y \not\perp\!\!\!\perp W$

Which causal models explain these? assuming no unobserved variables = latents

Find the only possible model (accounting for latents)

Causal inference methdos use some of these assumptions, and others (e.g. time order)

Selection bias

X, Y independent

Suppose we only observe samples for S = Rand
Suppose we only observe samples for S = (Y > 0)
Suppose we only observe samples for S = (X > 0)
Suppose we only observe samples for S = (Y > 0) or (X > 0)

Dependent $Y = cX + \eta$ with $S = (Y > 0)$
Dependent $Y = cX + \eta$ with $S = (X > 0)$
Dependent $Y = cX + \eta$ with $S = (Y > 0) \lor (X > 0)$

Then apply MAG to encode the causal relationships