# Algorithm 1: Generalized ID Algorithm for ioSCMs

**Source:** Forré & Mooij (2019) - "Causal Calculus in the Presence of Cycles, Latent Confounders and Selection Bias"

---

## Algorithm Pseudocode

### Function: ID (Main Function)

$$
\begin{align*}
&\textbf{1: function } \mathbf{ID}(G, \mathbf{Y}, \mathbf{W}, P(\mathbf{V} | do(\mathbf{J}))) \\
&\textbf{2: require: } \mathbf{Y} \subseteq \mathbf{V}, \mathbf{W} \subseteq \mathbf{V}, \mathbf{Y} \cap \mathbf{W} = \emptyset \\
&\textbf{3: } H \leftarrow \text{Anc}^{G_{\mathbf{V} \setminus \mathbf{W}}}(\mathbf{Y}) \\
&\textbf{4: } \textbf{for } C \in \text{CD}(H) \textbf{ do} \\
&\textbf{5: } \quad Q[C] \leftarrow \text{IDCD}(G, C, \text{Cd}^G(C), Q[\text{Cd}^G(C)]) \\
&\textbf{6: } \quad \textbf{if } Q[C] = \text{FAIL} \textbf{ then} \\
&\textbf{7: } \quad\quad \textbf{return } \text{FAIL} \\
&\textbf{8: } \quad \textbf{end if} \\
&\textbf{9: } \textbf{end for} \\
&\textbf{10: } Q[H] \leftarrow \left[ \bigotimes_{C \in \text{CD}(H)} \right] Q[C] \\
&\textbf{11: } \textbf{return } P(\mathbf{Y} | do(\mathbf{J}, \mathbf{W})) = \int Q[H] \, d\mathbf{x}_{H \setminus \mathbf{Y}} \\
&\textbf{12: } \textbf{end function}
\end{align*}
$$

---

### Function: IDCD (Recursive Helper Function)

$$
\begin{align*}
&\textbf{13: function } \mathbf{IDCD}(G, \mathbf{C}, \mathbf{D}, Q[\mathbf{D}]) \\
&\textbf{14: require: } \mathbf{C} \subseteq \mathbf{D} \subseteq \mathbf{V}, \text{CD}(G_{\mathbf{D}}) = \{\mathbf{D}\} \\
&\textbf{15: } A \leftarrow \text{Anc}^{G[\mathbf{D}]}(\mathbf{C}) \cap \mathbf{D} \\
&\textbf{16: } Q[A] \leftarrow \int Q[\mathbf{D}] \, d(\mathbf{x}_{\mathbf{D} \setminus A}) \\
&\textbf{17: } \textbf{if } A = \mathbf{C} \textbf{ then} \\
&\textbf{18: } \quad \textbf{return } Q[A] \\
&\textbf{19: } \textbf{else if } A = \mathbf{D} \textbf{ then} \\
&\textbf{20: } \quad \textbf{return } \text{FAIL} \\
&\textbf{21: } \textbf{else if } \mathbf{C} \subset A \subset \mathbf{D} \textbf{ then} \\
&\textbf{22: } \quad \textbf{for } S \in \mathcal{S}(G[A]) \text{ s.t. } S \subseteq \text{Cd}^{G[A]}(\mathbf{C}) \textbf{ do} \\
&\textbf{23: } \quad\quad R_A[S] \leftarrow P(S | \text{Pred}^G_<(S) \cap A, do(\mathbf{J} \cup \mathbf{V} \setminus A)) \\
&\textbf{24: } \quad \textbf{end for} \\
&\textbf{25: } \quad Q[\text{Cd}^{G[A]}(\mathbf{C})] \leftarrow \bigotimes_{\substack{S \in \mathcal{S}(G[A]) \\ S \subseteq \text{Cd}^{G[A]}(\mathbf{C})}} R_A[S] \\
&\textbf{26: } \quad \textbf{return } \text{IDCD}(G, \mathbf{C}, \text{Cd}^{G[A]}(\mathbf{C}), Q[\text{Cd}^{G[A]}(\mathbf{C})]) \\
&\textbf{27: } \textbf{end if} \\
&\textbf{28: } \textbf{end function}
\end{align*}
$$

### Terms to be familiar with

1. **Well-defined** is often used in this notebook. 
    - This means that the operation/function should always produce an unambiguous result
    - The result does not depend on how you represent the input
    - It makes mathematical sense (no contradictions or undefined behaviors)

## Line 1: Function Declaration 

$$
\textbf{1: function } \mathbf{ID}(G, \mathbf{Y}, \mathbf{W}, P(\mathbf{V} | do(\mathbf{J})))
$$

---

#### 1. Symbols

| Symbol | Meaning |
|--------|---------|
| $\mathbf{ID}$ | Identification algorithm function |
| $G$ | Directed mixed graph (DMG) with possible cycles |
| $\mathbf{Y}$ | Target/outcome variables (what we want to predict) |
| $\mathbf{W}$ | Intervention/treatment variables (what we manipulate) |
| $\mathbf{V}$ | All observed variables in the system |
| $\mathbf{J}$ | Background intervention variables (fixed experimental conditions) |
| $P(\mathbf{V} \| do(\mathbf{J}))$ | Observational distribution under background interventions |
| $do(\cdot)$ | Intervention operator  |

---

#### 2. Definitions

**Directed Mixed Graph (DMG) (Definition 4.1):**

A directed mixed graph $G$ consists of:
- A set of nodes $V$ (finite set of observed variables)
- A set of directed edges ($\rightarrow$): denoted $(v_i, v_j) \in E^{\rightarrow}$ or $v_i \rightarrow v_j$
- A set of bidirected edges ($\leftrightarrow$): denoted $(v_i, v_j) \in E^{\leftrightarrow}$ or $v_i \leftrightarrow v_j$

**Key property:** In contrast to acyclic directed mixed graphs (ADMGs), $G$ **may contain directed cycles** (feedback loops).

**Relationship to ioSCMs:** The graph $G$ in Algorithm 1 is the **induced DMG** of an underlying ioSCM (see Definition 5.1), obtained by marginalizing out latent variables from the full graph $G^+ = (V^+ = V \cup U \cup J, E^+)$.

**Variable Sets:**
- $\mathbf{V}$: Set of all observed variables in the system (finite, non-empty)
- $\mathbf{Y} \subseteq \mathbf{V}$: Set of target/outcome variables, $\mathbf{Y} \neq \emptyset$
- $\mathbf{W} \subseteq \mathbf{V}$: Set of intervention/treatment variables, $\mathbf{W} \neq \emptyset$
- $\mathbf{J} \subseteq \mathbf{V}$: Set of background intervention variables, $\mathbf{J}$ may be $\emptyset$

**Probability Distribution:**

$P(\mathbf{V} | do(\mathbf{J}))$ is a probability distribution over all possible configurations of variables in $\mathbf{V}$, conditioned on the intervention $do(\mathbf{J})$.

Formally: $P(\mathbf{V} | do(\mathbf{J})): \Omega_{\mathbf{V}} \rightarrow [0,1]$ where $\Omega_{\mathbf{V}}$ is the sample space (all possible configurations of $\mathbf{V}$).

**Intervention Operator (Definition 2.10):**

The intervention operator $do(\mathbf{X})$ for $\mathbf{X} \subseteq \mathbf{V}$ performs **graph surgery**:
- Creates mutilated graph $G_{\overline{\mathbf{X}}}$ by removing all incoming directed edges to variables in $\mathbf{X}$
- Formally: Removes all edges $v \rightarrow w$ where $w \in \mathbf{X}$
- **Semantics:** Fixes variables in $\mathbf{X}$ to specific values, overriding their natural causal mechanisms
---

**Input Constraints:**
- $\mathbf{Y} \subseteq \mathbf{V}$ and $\mathbf{Y} \neq \emptyset$ (non-empty target variables)
- $\mathbf{W} \subseteq \mathbf{V}$ and $\mathbf{W} \neq \emptyset$ (non-empty intervention variables)
- $\mathbf{J} \subseteq \mathbf{V}$ (background interventions, may be $\emptyset$)
- $\mathbf{Y} \cap \mathbf{W} = \emptyset$ (cannot intervene on targets)
---

#### **Symbol Assumptions:**

*Graph ($G$):*
- $G$ is a directed mixed graph (DMG) with a finite set of nodes $V$
- Contains directed edges ($\rightarrow$) and bidirected edges ($\leftrightarrow$)
- **May contain directed cycles** (key property for gene regulatory networks)
- No self-loops: no edges from a node to itself
- Bidirected edges are symmetric: if $v_i \leftrightarrow v_j$ then $v_j \leftrightarrow v_i$
- Directed edges are asymmetric: if $v_i \rightarrow v_j$ then $v_j \not\rightarrow v_i$
- Mixed edges allowed: Can have both $v_i \rightarrow v_j$ and $v_i \leftrightarrow v_j$

*Variable Sets ($\mathbf{Y}, \mathbf{W}, \mathbf{J}, \mathbf{V}$):*
- All finite sets with unique variable identifiers
- Data structure: `frozenset[str]` (hashable, immutable)
- No ordering assumed (sets are unordered)
- Disjointness: $\mathbf{Y} \cap \mathbf{W} = \emptyset$ (required)
- Overlaps allowed: $\mathbf{J} \cap \mathbf{Y}$ and $\mathbf{J} \cap \mathbf{W}$ permitted

*Distribution ($P(\mathbf{V} | do(\mathbf{J}))$):*
- Valid probability measure: $P \in [0, 1]$ and $\sum_{\mathbf{V}} P(\mathbf{V} | do(\mathbf{J})) = 1$
- Positivity: $P(\mathbf{V} | do(\mathbf{J})) > 0$ for all realizable configurations (no structural zeros)
- Markov compatible: Respects conditional independencies encoded in $G$
- Represents post-intervention distribution under graph surgery $G_{\overline{\mathbf{J}}}$

*Intervention Operator ($do(\cdot)$):*
- Graph surgery semantics: Removes incoming directed edges to intervened variables
- Idempotent: $do(\mathbf{X}, \mathbf{X}) \equiv do(\mathbf{X})$
- Commutative: $do(\mathbf{X}, \mathbf{Y}) \equiv do(\mathbf{Y}, \mathbf{X})$ for disjoint sets
- Identity: $do(\emptyset)$ performs no intervention


---

#### 3. English Explanation

This function answers: **"Can we predict the effect of intervention $do(\mathbf{W})$ on outcome $\mathbf{Y}$ using only observational data?"**

**Setup:**
- We have a causal graph $G$ describing relationships (including cycles and confounders)
- We have observational data: $P(\mathbf{V} | do(\mathbf{J}))$ collected under fixed conditions $\mathbf{J}$
- We want to know: what would happen to $\mathbf{Y}$ if we intervened on $\mathbf{W}$?

**The Goal:**
Determine if $P(\mathbf{Y} | do(\mathbf{J}, \mathbf{W}))$ can be expressed using only $P(\mathbf{V} | do(\mathbf{J}))$, without actually performing the intervention on $\mathbf{W}$.

---

#### Assumptions

**Mathematical Assumptions:**

*Input Existence:*
- $G$ exists as a well-formed DMG with finite nodes
- Variable sets $\mathbf{Y}, \mathbf{W}, \mathbf{J}, \mathbf{V}$ are finite, well-defined sets
- Probability distribution $P(\mathbf{V} | do(\mathbf{J}))$ exists and is a valid probability measure:
  - $P(\mathbf{V} | do(\mathbf{J})) \in [0, 1]$
  - $\sum_{\mathbf{V}} P(\mathbf{V} | do(\mathbf{J})) = 1$

*Input Validity:*
- Input constraints are satisfied (as defined in Definitions section):
  - $\mathbf{Y} \subseteq \mathbf{V}$, $\mathbf{Y} \neq \emptyset$
  - $\mathbf{W} \subseteq \mathbf{V}$, $\mathbf{W} \neq \emptyset$
  - $\mathbf{J} \subseteq \mathbf{V}$
  - $\mathbf{Y} \cap \mathbf{W} = \emptyset$

## Line 2 - Precondition Check

$$
\textbf{2: require: } \mathbf{Y} \subseteq \mathbf{V}, \mathbf{W} \subseteq \mathbf{V}, \mathbf{Y} \cap \mathbf{W} = \emptyset
$$

---

#### 1. Symbols

| Symbol | Meaning |
|--------|---------|
| $\textbf{require}$ | Precondition that must be satisfied before proceeding |
| $\mathbf{Y}$ | Target/outcome variables |
| $\mathbf{W}$ | Intervention/treatment variables |
| $\mathbf{V}$ | All observed variables |
| $\subseteq$ | Subset relation (contained in) |
| $\cap$ | Set intersection |
| $\emptyset$ | Empty set |

---

#### 2. Definitions

**Three preconditions must hold:**

1. **$\mathbf{Y} \subseteq \mathbf{V}$**: Target variables are observed
   - Formally: $\forall y \in \mathbf{Y}, y \in \mathbf{V}$
   - Equivalently: Every target variable must be in the observed variable set

2. **$\mathbf{W} \subseteq \mathbf{V}$**: Intervention variables are observed
   - Formally: $\forall w \in \mathbf{W}, w \in \mathbf{V}$
   - Equivalently: Every intervention variable must be in the observed variable set

3. **$\mathbf{Y} \cap \mathbf{W} = \emptyset$**: Target and intervention sets are disjoint
   - Formally: $\nexists x : (x \in \mathbf{Y} \land x \in \mathbf{W})$
   - Equivalently: No variable can be both a target and an intervention

**Enforcement:** These are **hard constraints**. If any condition is false, the algorithm terminates immediately without returning a result.

---

#### **Symbol Assumptions:**

*Set Operations:*
- Standard set operations ($\subseteq$, $\cap$) are well-defined on finite sets
- Subset checking is decidable for finite sets
- Set intersection is computable for finite sets
- Equality with empty set is decidable

*Variable Sets (from Line 1):*
- $\mathbf{Y}, \mathbf{W}, \mathbf{V}$ are finite sets defined in Line 1
- Set membership and equality are well-defined operations


--- 

#### 3. English Explanation

This line checks three conditions that must all be true before the algorithm can proceed:

**Condition 1: $\mathbf{Y} \subseteq \mathbf{V}$ - "Target variables must be observable"**
- Cannot identify causal effects on variables we don't measure
- Every variable we want to predict must be in our dataset

**Condition 2: $\mathbf{W} \subseteq \mathbf{V}$ - "Intervention variables must be observable"**
- Cannot intervene on variables we cannot observe or control
- Must be able to measure variables we're manipulating to verify the intervention

**Condition 3: $\mathbf{Y} \cap \mathbf{W} = \emptyset$ - "Target and intervention sets cannot overlap"**
- Prevents circular questions like "What is the effect of X on X?"
- If we're fixing X to a specific value, we already know its value
- Cannot simultaneously intervene on X and ask what happens to X

**If any condition fails:** The algorithm immediately stops and cannot proceed.

---

#### 4. Assumptions

**Mathematical Assumptions (Line 2 - Precondition Check):**

*From Line 1:*
- All variable sets are finite: $|\mathbf{V}| < \infty$, therefore $|\mathbf{Y}|, |\mathbf{W}| < \infty$
- Standard set operations ($\subseteq$, $\cap$) are well-defined and computable on finite sets

*Computational:*
- The precondition checks can be performed in finite time
- All three checks are decidable (can determine true/false algorithmically)

**Note:** These are **preconditions**, not assumptions about the causal model. They define when the identification problem is well-posed. If violated, then the problem is invalid by definition.

---

## Line 3 - Ancestral Closure

$$
\textbf{3: } H \leftarrow \text{Anc}^{G_{\mathbf{V} \setminus \mathbf{W}}}(\mathbf{Y})
$$

---

#### 1. Symbols

| Symbol | Meaning |
|--------|---------|
| $H$ | Set of relevant variables (result of this operation) |
| $\leftarrow$ | Assignment operator |
| $\text{Anc}^G(\mathbf{Y})$ | Ancestors of $\mathbf{Y}$ in graph $G$ |
| $G_{\mathbf{V} \setminus \mathbf{W}}$ | Subgraph induced by nodes $\mathbf{V} \setminus \mathbf{W}$ |
| $\mathbf{V} \setminus \mathbf{W}$ | Set difference: all variables in $\mathbf{V}$ except those in $\mathbf{W}$ |
---

#### 2. Definitions
**Ancestor Set:**

For a graph $G$ with nodes $V$ and subset $\mathbf{Y} \subseteq V$:

$$\text{Anc}^G(\mathbf{Y}) = \{v \in V : \exists \text{ directed path } v \rightarrow \cdots \rightarrow y \text{ in } G \text{ for some } y \in \mathbf{Y}\} \cup \mathbf{Y}$$

**Directed path:** A sequence using **only directed edges** ($\rightarrow$):
$$v \rightarrow v_1 \rightarrow v_2 \rightarrow \cdots \rightarrow v_k \rightarrow y$$

where each arrow represents a directed edge from $E^{\rightarrow}$.

**Important:** 
- Ancestor computation uses **only directed edges** ($\rightarrow$)
- Bidirected edges ($\leftrightarrow$) are **excluded** from directed paths
- Bidirected edges represent unmeasured confounders, not direct causal ancestry
- Reflexivity: $\mathbf{Y} \subseteq \text{Anc}^G(\mathbf{Y})$ always holds

**Set Difference:**

$$\mathbf{V} \setminus \mathbf{W} = \{v \in \mathbf{V} : v \notin \mathbf{W}\}$$

**Induced Subgraph:**

For a DMG $G = (V, E^{\rightarrow}, E^{\leftrightarrow})$ and subset $S \subseteq V$, the induced subgraph $G_S$ is defined as:

$$G_S = (S, E^{\rightarrow}_S, E^{\leftrightarrow}_S)$$

where:
- $E^{\rightarrow}_S = \{(v_i, v_j) \in E^{\rightarrow} : v_i, v_j \in S\}$
- $E^{\leftrightarrow}_S = \{(v_i, v_j) \in E^{\leftrightarrow} : v_i, v_j \in S\}$

The induced subgraph retains only nodes in $S$ and all edges from $G$ whose both endpoints are in $S$.

**Result:** 
$$H = \text{Anc}^{G_{\mathbf{V} \setminus \mathbf{W}}}(\mathbf{Y})$$

This is the set of all ancestors of $\mathbf{Y}$ computed in the induced subgraph $G_{\mathbf{V} \setminus \mathbf{W}}$.

**Properties of $H$:**
- $\mathbf{Y} \subseteq H$ (always includes target variables)
- $H \subseteq \mathbf{V} \setminus \mathbf{W}$ (subset of non-intervention variables)
- $H \cap \mathbf{W} = \emptyset$ (intervention variables excluded by construction)
- $H$ is **ancestral** in $G_{\mathbf{V} \setminus \mathbf{W}}$: $\text{Anc}^{G_{\mathbf{V} \setminus \mathbf{W}}}(H) = H$

---

#### **Symbol Assumptions:**

*Graph Structure:*
- $G$ is the induced DMG from Line 1 (finite, may contain cycles)
- Induced subgraph $G_{\mathbf{V} \setminus \mathbf{W}}$ is well-defined
- Ancestor computation is well-defined even in cyclic graphs (via reachability)

*Set Operations:*
- Set difference $\mathbf{V} \setminus \mathbf{W}$ is well-defined for finite sets
- Ancestor computation terminates in finite time on finite graphs

---


#### 3. English Explanation

This line identifies the **causally relevant variables** for predicting $\mathbf{Y}$ under intervention $do(\mathbf{W})$.

**Two-step process:**

**Step 1: Remove intervention nodes**
- Create modified graph $G_{\mathbf{V} \setminus \mathbf{W}}$ by removing all nodes in $\mathbf{W}$
- Represents the semantics of $do(\mathbf{W})$: "cut off what naturally causes $\mathbf{W}$"
- After intervention, $\mathbf{W}$ is determined by the experimenter, not by the causal system

**Step 2: Find ancestors of $\mathbf{Y}$**
- In the modified graph, find all variables with directed paths to any variable in $\mathbf{Y}$
- These are the only variables that can causally influence $\mathbf{Y}$ (after intervening on $\mathbf{W}$)
- Variables with no path to $\mathbf{Y}$ are irrelevant for identification

**Why this matters:**
- Reduces the identification problem to a smaller set $H$ instead of all variables $\mathbf{V}$
- Focuses computation on causally relevant variables only
- Forms the basis for the decomposition in subsequent lines

---

#### 4. Assumptions

**Ancestral Closure Principle**

**From Lemma 9.7 (page 8):** For ancestral set $A$ where $\text{Anc}^G(A) = A$:

$$P_{M[A]}(A \cap \mathbf{V} | do(A \cap \mathbf{J})) = P_M(A \cap \mathbf{V} | do(\mathbf{J} \cup \mathbf{W}))$$

**What this assumes for Line 3:**
- The ancestral set $H = \text{Anc}^{G_{\mathbf{V} \setminus \mathbf{W}}}(\mathbf{Y})$ forms a valid sub-ioSCM (Definition 9.6, page 8)
- Variables outside $H$ are causally irrelevant for computing $P(\mathbf{Y} | do(\mathbf{J}, \mathbf{W}))$

**From Definition 2.10 (page 3):** Removing nodes $\mathbf{W}$ from graph $G$ corresponds to intervention $do(\mathbf{W})$ (graph surgery = intervention)

**From page 9:** Algorithm exploits that causal effects onto ancestral subsets are identifiable

## Line 4 - Loop Over Consolidated Districts

$$
\textbf{4: } \textbf{for } C \in \text{CD}(H) \textbf{ do}
$$

---

#### 1. Symbols

| Symbol | Meaning |
|--------|---------|
| $\textbf{for}$ ... $\textbf{do}$ | Loop control structure |
| $C$ | A consolidated district (subset of variables in $H$) |
| $\in$ | Set membership |
| $\text{CD}(H)$ | Set of all consolidated districts in subgraph $H$ |
| $H$ | Ancestral closure from Line 3 |

---
#### 2. Definitions

**Strongly Connected Component (Definition 2.1, page 2):**

For a directed graph $G = (V, E)$ and $v \in V$:

$$\text{Sc}^G(v) = \text{Anc}^G(v) \cap \text{Desc}^G(v)$$

The strongly connected component of $v$ is the set of all nodes that can reach $v$ AND that $v$ can reach.

**Consolidated District (Definition 9.1, page 8):**

Let $G$ be a DMG with nodes $V$. For $v \in V$, the consolidated district $\text{Cd}^G(v)$ is:

$$\text{Cd}^G(v) = \{w \in V : \exists k \geq 1 \text{ nodes } (v_1, \ldots, v_k) \text{ s.t. } v_1 = v, v_k = w,$$
$$\text{and } \forall i \in \{2,\ldots,k\}: (v_{i-1} \leftrightarrow v_i) \in E^{\leftrightarrow} \text{ or } v_i \in \text{Sc}^G(v_{i-1})\}$$

**In words:** $w \in \text{Cd}^G(v)$ if there exists a path from $v$ to $w$ where each step is either:
1. A bidirected edge ($v_{i-1} \leftrightarrow v_i$), OR
2. Within the same strongly connected component ($v_i \in \text{Sc}^G(v_{i-1})$)

**For subset $B \subseteq V$:**
$$\text{Cd}^G(B) = \bigcup_{v \in B} \text{Cd}^G(v)$$

**Set of Consolidated Districts:**
$$\text{CD}(G) = \{\text{Cd}^G(v) : v \in V\}$$

Note: $\text{CD}(G)$ is the set of **distinct** consolidated districts (duplicates removed).

**Line 4 Specific:**

In Line 4, we compute $\text{CD}(H)$ which means:
- Apply the consolidated district operation to the induced subgraph $G[H]$
- This gives the set of consolidated districts within the ancestral closure $H$

**Properties of Consolidated Districts:**
- Partition $V$: Every node is in exactly one consolidated district
- If $v \in \text{Cd}^G(w)$, then $\text{Cd}^G(v) = \text{Cd}^G(w)$ (equivalence relation)
- In acyclic graphs without bidirected edges: $\text{CD}(G) = \{\{v\} : v \in V\}$ (singletons)
- Generalizes C-components from acyclic to cyclic graphs
---
#### **Symbol Assumptions:**

*Graph Structure:*
- $G[H]$ is the induced subgraph on $H$ (from Line 3)
  
*Consolidated District Properties:*
- Consolidated districts form a partition of $H$
  
- Consolidated district computation is well-defined
  
*Loop Properties:*
- The loop iterates over each distinct consolidated district exactly once
---


#### 3. English Explanation

This line begins a loop that processes each **consolidated district** in $H$ separately.

**What is a consolidated district?**

A maximal set of variables coupled together by:
1. **Latent confounders** (bidirected edges $\leftrightarrow$), OR
2. **Feedback loops** (same strongly connected component)

**Why loop over districts?**
- Variables within a district must be identified together (cannot be separated)
- Different districts can be processed independently
- Divide-and-conquer strategy: solve one district at a time

**In acyclic graphs without confounders:** Each district = one variable

**In cyclic graphs or with confounders:** Districts can contain multiple variables


---
#### 4. Assumptions

**Decomposition via Consolidated Districts**

**From Proposition 9.8, Point 3 (page 8):** For consolidated district $D \subseteq \mathbf{V}$:

$$P(D | do(\mathbf{J} \cup \mathbf{V} \setminus D)) = \bigotimes_{S \in \mathcal{S}(G), S \subseteq D} P(S | \text{Pred}^G_<(S) \cap \mathbf{V}, do(\mathbf{J}))$$

**What this assumes:**
- Each consolidated district can be identified as an independent subproblem
- This justifies processing districts separately in the loop

**From Definition 9.1 (page 8):** Consolidated districts generalize C-components from acyclic graphs to cyclic DMGs

**From page 9:** Algorithm exploits that causal effects onto consolidated districts are identifiable and processing districts separately maintains soundness

## Line 5 - Calling IDCD for each District

$$
\textbf{5: } Q[C] \leftarrow \text{IDCD}(G, C, \text{Cd}^G(C), Q[\text{Cd}^G(C)])
$$
---
#### 1. Symbols

| Symbol | Meaning |
|--------|---------|
| $Q[C]$ | Identified distribution for district $C$ (result/output) |
| $\leftarrow$ | Assignment operator |
| $\text{IDCD}$ | Helper function for identifying consolidated districts (Line 13) |
| $G$ | Original DMG (directed mixed graph) |
| $C$ | Current consolidated district from Line 4 loop |
| $\text{Cd}^G(C)$ | Consolidated district of $C$ in graph $G$ |
| $Q[\text{Cd}^G(C)]$ | Distribution over $\text{Cd}^G(C)$ |

---

#### 2. Definitions

**IDCD Function (Algorithm 1, Line 13):**

Function signature:
$$\textbf{function } \text{IDCD}(G, C, D, Q[D])$$

**Input constraints (Line 14):**
- $C \subseteq D \subseteq V$
- $\text{CD}(G_D) = \{D\}$ (i.e., $D$ is a single consolidated district in $G_D$)

**Purpose:** Identify the causal effect on $C$ given the consolidated district $D$ and distribution $Q[D]$.

**Line 5 Specific Call:**

The call $\text{IDCD}(G, C, \text{Cd}^G(C), Q[\text{Cd}^G(C)])$ has:
- First argument: $G$ (the full induced DMG)
- Second argument: $C$ (target set to identify)
- Third argument: $\text{Cd}^G(C)$ (consolidated district containing $C$)
- Fourth argument: $Q[\text{Cd}^G(C)]$ (distribution over that district)

**Key observation:** Since $C \in \text{CD}(H)$ (from Line 4), and consolidated districts partition $H$:
$$\text{Cd}^G(C) = C$$

when $C$ is already a consolidated district. However, the algorithm is written generally to handle cases where $C$ might be a subset of a larger district.

**Return value:**
- **Success:** $Q[C] = P(C | \text{Pred}^{G[H]}_<(C) \cap H, do(\mathbf{J}, V \setminus H))$
- **Failure:** FAIL (checked in Line 6)

**Data structure Q:**

$Q$ is an associative array (dictionary/map) where:
- **Keys:** Sets of variables (districts)
- **Values:** Identified probability distributions over those variables
- Notation: $Q[C]$ retrieves the distribution for district $C$

---

**Symbol Assumptions:**

*IDCD Function:*
- IDCD is well-defined for all valid inputs


*Data Structure Q:*
- $Q$ is an associative array supporting key-value storage

*Consolidated District in G:*
- $\text{Cd}^G(C)$ is computable from $G$ and $C$
  
*From Line 4:*
- $C$ is a valid consolidated district from $\text{CD}(H)$
- Loop ensures all districts are processed


---


#### 3. English Explanation

This line calls the helper function **IDCD** to identify the causal effect for the current consolidated district $C$.

**What happens in this line:**

1. **Call IDCD:** Invoke the consolidated district identification function
2. **Pass current district:** $C$ is the district we're trying to identify (from Line 4 loop)
3. **Pass context:** $\text{Cd}^G(C)$ provides the consolidated district context
4. **Pass distribution:** $Q[\text{Cd}^G(C)]$ gives the distribution to work with
5. **Store result:** The result is stored in $Q[C]$ for later use

**Why call IDCD?**

Each consolidated district requires its own identification procedure because:
- Districts may have internal cycles (strongly connected components)
- Districts may have internal confounders (bidirected edges)
- Standard identification techniques don't directly apply to such structures

**What IDCD does (high-level):**

IDCD uses a **recursive strategy** that alternates between:
1. **Ancestral closure** (find causally relevant variables)
2. **District decomposition** (break into smaller consolidated districts)

This continues until either:
- **Success:** The district is identified (returns a distribution)
- **Failure:** Identification is impossible (returns FAIL)


**The peculiarity of $\text{Cd}^G(C) = C$:**

Since $C$ comes from $\text{CD}(H)$ in Line 4, $C$ is **already** a consolidated district, so:
$$\text{Cd}^G(C) = C$$

This means the third and fourth arguments are essentially about $C$ itself. The generality is needed because:
- IDCD is recursive and may call itself with subsets
- Internal recursive calls may have $C \subsetneq D$ where $D = \text{Cd}^G(C)$
---

#### 4. Assumptions

**Mathematical Assumptions:**
- **Proposition 9.8, Point 2 (page 8):** Sub-ioSCMs preserve causal effects
- **Lemma 9.7 (page 8):** Ancestral subsets are identifiable  
- **Remark 9.3 (page 8):** Apt-orders exist for decomposition

**Statistical Assumption:**
- **Theorem 9.10 (page 9):** Density condition ensures factorization is well-defined

**Algorithmic Assumption:**
- **Remark 9.11 (page 9):** Alternating ancestral closure and district decomposition converges

## Lines 6-9 - Checking for Identification Failure, Error handling, and Loop Completion


$$
\begin{align*}
\textbf{6: } & \textbf{if } Q[C] = \text{FAIL} \textbf{ then} \\
\textbf{7: } & \quad \textbf{return } \text{FAIL} \\
\textbf{8: } & \textbf{end if} \\
\textbf{9: } & \textbf{end for}
\end{align*}
$$

---

#### 1. Symbols

| Symbol | Meaning |
|--------|---------|
| $\textbf{if}$ ... $\textbf{then}$ | Conditional control structure |
| $Q[C]$ | Result from IDCD call (Line 5) |
| $\text{FAIL}$ | Special return value indicating non-identifiability |
| $\textbf{return}$ | Exit function and return value |
| $\textbf{end if}$ | Close conditional block |
| $\textbf{end for}$ | Close loop from Line 4 |

---
#### 2. Definitions

**Control flow:**

**Line 6:** Check if identification of district $C$ failed
- If `Q[C] = FAIL` → Execute Line 7
- If `Q[C] ≠ FAIL` → Skip to Line 9

**Line 7:** Terminate entire algorithm with failure
- Return `FAIL` immediately
- Stop processing remaining districts
- Algorithm exits here

**Line 8:** End of conditional block
- Closes the `if` statement from Line 6

**Line 9:** End of district loop
- Closes the `for` loop from Line 4
- Only reached if all districts were successfully identified
- Execution continues to Line 10

**Execution Paths:**

**Path 1 (Failure):** 
$$\text{Line 4} \rightarrow \text{Line 5} \rightarrow \text{Line 6 (true)} \rightarrow \text{Line 7 (exit)}$$
Any district fails identification → Algorithm returns $\text{FAIL}$

**Path 2 (Success):**
$$\text{Line 4} \rightarrow \text{Line 5} \rightarrow \text{Line 6 (false)} \rightarrow \text{Line 9} \rightarrow \text{Line 10}$$
All districts successfully identified → Continue to combine results

---
#### Symbol Assumptions

*Control Flow:*
- Conditional and loop structures execute correctly

*From Line 5:*
- $Q[C]$ contains either a valid distribution or $\text{FAIL}$


*Loop Invariant:*
- After Line 9, all processed districts have $Q[C] \neq \text{FAIL}$
---


#### 3. English Explanation

These lines handle success/failure of the district identification loop. (Error handling)

**The logic:**

**Line 6:** After attempting to identify district $C$ (Line 5), check if it failed

**Line 7:** If any district fails identification:
- Stop immediately (early termination)
- Return `FAIL` to caller
- Don't waste time processing remaining districts

**Lines 8-9:** Close the control structures:
- End the `if` block (Line 8)
- End the `for` loop (Line 9)

**Two possible outcomes after Line 9:**

**Outcome 1: At least one district failed**
- Algorithm terminated at Line 7
- Never reached Line 9
- Returns `FAIL`

**Outcome 2: All districts succeeded**
- Loop completed normally (Line 9)
- All `Q[C]` values contain valid distributions
- Continue to Line 10 to combine results

---

#### 4. Assumptions

*Termination Properties:*
- Early return (Line 7) properly exits the function

*Correctness:*
- If any district is non-identifiable, the overall effect is non-identifiable
  - This follows from the decomposition in Proposition 9.8
  - Cannot compute joint $P(H | ...)$ if missing $P(C | ...)$ for any $C \in \text{CD}(H)$

### Line 10 - Recomposition via Product

$$
\textbf{10: } Q[H] \leftarrow \left[\bigotimes_{C \in \text{CD}(H)}\right] Q[C]
$$
---

#### 1. Symbols

| Symbol | Meaning |
|--------|---------|
| $Q[H]$ | Combined distribution over ancestral set $H$ (result) |
| $\leftarrow$ | Assignment operator |
| $\bigotimes$ | Product operator (factorized product) |
| $C$ | Loop variable over consolidated districts |
| $\in$ | Set membership |
| $\text{CD}(H)$ | Set of consolidated districts in $H$ (from Line 4) |
| $Q[C]$ | Distribution for district $C$ (from Line 5) |
| $H$ | Ancestral closure from Line 3 |


---

#### 2. Definitions

**Product Operation:**

$$Q[H] = \bigotimes_{C \in \text{CD}(H)} Q[C]$$

This computes the **product** (factorization) of all district distributions.

**Expanded form:**

If $\text{CD}(H) = \{C_1, C_2, \ldots, C_n\}$, then:

$$Q[H] = Q[C_1] \otimes Q[C_2] \otimes \cdots \otimes Q[C_n]$$

**Semantic interpretation:**

Each $Q[C]$ represents:
$$Q[C] = P(C | \text{Pred}^{G[H]}_<(C) \cap H, do(\mathbf{J}, V \setminus H))$$

The product reconstructs the joint distribution:
$$Q[H] = P(H | do(\mathbf{J}, V \setminus H))$$

**Product operator properties:**
- **Commutative:** Order of multiplication doesn't matter
- **Associative:** Grouping doesn't matter
- **Identity:** Empty product equals 1

---
#### **Symbol Assumptions:**

*From Lines 4-9:*
- All districts successfully identified: $Q[C] \neq \text{FAIL}$ for all $C \in \text{CD}(H)$

*Product Operation:*
- Product $\bigotimes$ is well-defined for probability distributions

*Partition Property:*
- Consolidated districts partition $H$: $\bigcup_{C \in \text{CD}(H)} C = H$ and disjoint

---
#### 3. English Explanation

This line combines all the individually identified consolidated district distributions into one joint distribution over $H$.

**What happens:**
- The loop (Lines 4-9) identified each district separately: $Q[C_1], Q[C_2], \ldots, Q[C_n]$
- Line 10 multiplies them together: $Q[H] = Q[C_1] \otimes Q[C_2] \otimes \cdots \otimes Q[C_n]$
- Result is the full distribution over all variables in the ancestral set $H$

**Why this works:**
- Consolidated districts partition $H$ (no overlap, complete coverage)
- Proposition 9.8 guarantees the product correctly reconstructs the joint distribution
- Order of multiplication doesn't matter (Theorem 9.10 ensures this)

--- 

#### 4. Assumptions

*From Previous Lines:*
- All districts in $\text{CD}(H)$ successfully identified (Lines 4-9)
- Each $Q[C]$ is a valid probability distribution
- Consolidated districts partition $H$ (Line 4)

*Factorization Principle (Proposition 9.8, Point 1, page 8):*

For the full graph:
$$P(V | do(\mathbf{J})) = \bigotimes_{S \in \mathcal{S}(G), S \subseteq V} P(S | \text{Pred}^G_<(S) \cap V, do(\mathbf{J}))$$

where $\mathcal{S}(G)$ is the set of strongly connected components.

**Applied to $H$:** Line 10 applies this factorization to the ancestral set $H$.

*Density Condition (Theorem 9.10, page 9):*

For every strongly connected component $S \subseteq V$, there exists a measure $\mu_S$ such that:
$$P(V | do(\mathbf{J})) \text{ has a density w.r.t. } \bigotimes_{S \in \mathcal{S}(G), S \subseteq V} \mu_S$$

**What this ensures:**
- The product $\bigotimes_{C \in \text{CD}(H)} Q[C]$ is well-defined
- Factorization correctly reconstructs the joint distribution
- Order of multiplication doesn't matter (commutativity holds)

## Lines 11-12 - Marginalization and Return, End of the ID Function

$$
\begin{align*}
\textbf{11: } & \textbf{return } P(\mathbf{Y} | do(\mathbf{J}, \mathbf{W})) = \int Q[H] \, dx_{H \setminus \mathbf{Y}} \\
\textbf{12: } & \textbf{end function}
\end{align*}
$$
---

#### 1. Symbols

| Symbol | Meaning |
|--------|---------|
| $\textbf{return}$ | Exit function and return value to caller |
| $P(\mathbf{Y} \| do(\mathbf{J}, \mathbf{W}))$ | Target causal effect (algorithm output) |
| $=$ | Equality (definitional) |
| $\int$ | Integration (marginalization) operator |
| $Q[H]$ | Joint distribution over ancestral set $H$ (from Line 10) |
| $dx_{H \setminus \mathbf{Y}}$ | Differential element for integration over $H \setminus \mathbf{Y}$ |
| $H \setminus \mathbf{Y}$ | Variables in $H$ not in $\mathbf{Y}$ (variables to marginalize out) |
| $\mathbf{Y}$ | Target variables (from Line 1) |
| $\mathbf{J}$ | Background intervention variables (from Line 1) |
| $\mathbf{W}$ | Intervention variables (from Line 1) |
| $\textbf{end function}$ | Close function definition |
---

#### 2. Definitions

**Marginalization Operation:**

$$P(\mathbf{Y} | do(\mathbf{J}, \mathbf{W})) = \int Q[H] \, dx_{H \setminus \mathbf{Y}}$$

**For continuous variables:** Integration marginalizes out unwanted variables

**For discrete variables (equivalent):**
$$P(\mathbf{Y} | do(\mathbf{J}, \mathbf{W})) = \sum_{x_{H \setminus \mathbf{Y}}} Q[H]$$

**What gets marginalized:**
- $H \setminus \mathbf{Y}$: All variables in ancestral set $H$ that are not in target $\mathbf{Y}$
- These are "intermediate" variables: causally relevant but not targets

**From Line 3:** $H = \text{Anc}^{G_{V \setminus W}}(\mathbf{Y})$, which guarantees:
- $\mathbf{Y} \subseteq H$ (targets always in ancestral set)
- $H \setminus \mathbf{Y} = \text{Anc}^{G_{V \setminus W}}(\mathbf{Y}) \setminus \mathbf{Y}$ (proper ancestors)

**Function Return Values:**

**Success path (Lines 1-11):**
- Returns: $P(\mathbf{Y} | do(\mathbf{J}, \mathbf{W}))$ as an expression in terms of observational distributions
- This is the **identified causal effect**

**Failure path (Line 7):**
- Returns: $\text{FAIL}$
- Means: Algorithm cannot identify the causal effect with this method
---

#### **Symbol Assumptions:**

*From Previous Lines:*
- $Q[H]$ is a valid joint distribution (from Line 10)
- $\mathbf{Y} \subseteq H$ (from Line 3)
- $H \subseteq V \setminus W$ (from Line 3)

*Marginalization:*
- Marginalization is well-defined


*Set Difference:*
- $H \setminus \mathbf{Y}$ is well-defined


*Edge Case:*
- If $H = \mathbf{Y}$ (no proper ancestors), then $H \setminus \mathbf{Y} = \emptyset$
  - **Testable:** Empty marginalization returns $Q[H]$ unchanged
  - **Testable:** $\int Q[H] \, dx_{\emptyset} = Q[H]$


---

#### 3. English Explanation

These lines complete the ID function by marginalizing to the target variables and returning the result.

**Line 11 - Marginalization:**
- Takes the joint distribution $Q[H]$ over all ancestors of $\mathbf{Y}$
- Integrates out all variables except those in target set $\mathbf{Y}$
- Produces the desired causal effect $P(\mathbf{Y} | do(\mathbf{J}, \mathbf{W}))$

**Line 12 - Function closure:**
- Marks the end of the ID function (Lines 1-12)

**Why marginalization works:**
- $H$ contains all variables causally relevant to $\mathbf{Y}$
- Some variables in $H$ were needed for identification but aren't in the target
- Marginalization removes these intermediate variables, leaving only $\mathbf{Y}$

**Algorithm completes successfully:** Returns the causal effect expressed in terms of observational distributions

---
#### 4. Assumptions

*From Previous Lines:*
- $Q[H]$ is a valid joint distribution (Line 10)
- $\mathbf{Y} \subseteq H$ (Line 3 guarantees targets are in ancestral set)
- All consolidated districts successfully identified (Lines 4-9)

*Probability Theory:*

**Marginalization Correctness:**
For joint distribution $P(X, Y)$:
$$P(Y) = \int P(X, Y) \, dx$$

This is a fundamental result from probability theory ensuring marginalization produces the correct marginal.

**Applied to Line 11:**
$$P(\mathbf{Y} | do(\mathbf{J}, \mathbf{W})) = \int Q[H] \, dx_{H \setminus \mathbf{Y}}$$

is the correct marginal distribution over $\mathbf{Y}$ from the joint $Q[H]$.

*Ancestral Sufficiency (Lemma 9.7, page 8):*

For ancestral set $A \subseteq V$ where $\text{Anc}^G(A) = A$:

$$P_{M[A]}(A \cap V | do(A \cap \mathbf{J})) = P_M(A \cap V | do(\mathbf{J} \cup W))$$

for any $W \subseteq V \setminus (A \cap V)$ that contains $(\text{Pa}^G(A) \cap V) \setminus (A \cap V)$.

**What this ensures:**
- Working with ancestral set $H$ (instead of full $V$) is sufficient
- Marginalizing $Q[H]$ gives the same result as marginalizing from the full distribution
- No information loss from focusing on ancestors only

*From Proposition 9.8 and Theorem 9.10:*
- The factorization in Line 10 and marginalization in Line 11 correctly identify the causal effect
- If the algorithm returns a result (not FAIL), that result is the correct causal effect


## Line 13 - Function IDCD (Recursive Helper Function)


$$
\textbf{13: function } \text{IDCD}(G, C, D, Q[D])
$$

---

#### 1. Symbols

| Symbol | Meaning |
|--------|---------|
| $\textbf{function}$ | Function definition keyword |
| $\text{IDCD}$ | Identify Consolidated District (recursive helper function) |
| $G$ | Directed mixed graph (DMG) from Line 1 |
| $C$ | Target variables to identify (where $C \subseteq D$) |
| $D$ | Consolidated district containing $C$ (where $\text{CD}(G_D) = \{D\}$) |
| $Q[D]$ | Probability distribution over consolidated district $D$ |
---

#### 2. Definitions

**IDCD Function Signature (Algorithm 1, Line 13):**

$$\textbf{function } \text{IDCD}(G, C, D, Q[D])$$

**Purpose:** Identify the causal effect on target set $C$ contained within consolidated district $D$.

**Input Parameters:**
1. $G = (V, E^{\rightarrow}, E^{\leftrightarrow})$: Directed mixed graph (Definition 4.1, page 4)
2. $C \subseteq D \subseteq V$: Target variable set
3. $D \subseteq V$: Consolidated district where $\text{CD}(G_D) = \{D\}$ (Definition 9.1, page 8)
4. $Q[D]$: Probability distribution over variables in $D$

**Output:**
- **Success:** Returns identified expression for causal effect
- **Failure:** Returns FAIL

**Relationship Between $C$ and $D$:**

At **initial call from Line 5:**
- $C \in \text{CD}(H)$ (consolidated district from main loop)
- $D = \text{Cd}^G(C) = C$ (since $C$ is already a consolidated district)
- Therefore: $C = D$ at initial invocation

During **recursive calls (Line 26):**
- $C = H$ where $H = \text{Anc}^{G_D}(C)$ (ancestral set from Line 15)
- $D$ remains the original district
- Therefore: $C \subsetneq D$ (proper subset) after recursion

**Induced Subgraph $G_D$ (Definition 5.1, page 5):**

$$G_D = G[D] = (D, E^{\rightarrow}_D, E^{\leftrightarrow}_D)$$

where:
- $E^{\rightarrow}_D = \{(u, v) \in E^{\rightarrow} : u, v \in D\}$
- $E^{\leftrightarrow}_D = \{(u, v) \in E^{\leftrightarrow} : u, v \in D\}$

**Consolidated District Constraint (Definition 9.1, page 8):**

The constraint $\text{CD}(G_D) = \{D\}$ means $D$ forms a single consolidated district in the induced subgraph $G_D$. All variables in $D$ are coupled via:
- Strongly connected components (cycles), OR
- Bidirected edges (unmeasured confounders)
---

#### **Symbol Assumptions:**

*Function Properties:*
- IDCD is recursive
- Deterministic output for given inputs

*Subset Constraints:*
- $C \subseteq D \subseteq V$ (nested containment)
- After recursion: $C \subsetneq D$ or $C = D$

*Consolidated District Property:*
- $\text{CD}(G_D) = \{D\}$ (single district in induced subgraph)

*Graph Parameter:*
- $G$ is unchanged throughout recursion

*Distribution Parameter:*
- $Q[D]$ has domain exactly $D$
- $Q$ is associative array with key $D$

---

#### 3. English Explanation

This line declares **IDCD** (Identify Consolidated District), the recursive helper function that identifies causal effects within a single consolidated district.

**Purpose:**

IDCD solves the identification problem for target variables $C$ contained within consolidated district $D$. The district $D$ may contain:
- **Feedback loops** (directed cycles)
- **Unmeasured confounders** (bidirected edges)
- **Complex causal structure** not decomposable into smaller districts

**Why IDCD is Needed:**

The main ID algorithm (Lines 1-12) decomposes the problem into consolidated districts, but **within each district**, standard identification techniques fail due to cycles. IDCD handles this using a recursive strategy.

**The Parameters Explained:**

**$G$ (Graph):**
- The full directed mixed graph from Line 1
- Remains constant throughout recursion
- Provides structural information for all subgraph operations

**$C$ (Target Set):**
- Variables we want to identify the effect on
- Initially equals $D$ (entire district from Line 5)
- May shrink during recursion to ancestral sets
- Constraint: $C \subseteq D$ always holds

**$D$ (District):**
- The consolidated district containing our targets
- Forms a single consolidated district: $\text{CD}(G_D) = \{D\}$
- All variables in $D$ are coupled via cycles or confounders
- May shrink during recursion

**$Q[D]$ (Distribution):**
- Probability distribution over all variables in district $D$
- Provides the "raw material" for identification
- Initially from Line 10 (product of identified sub-districts)
- Used to construct identification formula in later lines (Line 16)

**Recursive Strategy (Algorithm 2):**

IDCD follows a three-case structure:
1. **Marginalize to ancestors** (Lines 15-16): Compute $A = \text{Anc}^{G[D]}(C) \cap D$ and marginalize $Q[D]$ to $Q[A]$
2. **Check three cases** (Lines 17-27):
   - **Case 1 (A = C):** Base case, return $Q[A]$ (Line 18)
   - **Case 2 (A = D):** Cannot restrict further, return FAIL (Line 20)
   - **Case 3 (C ⊂ A ⊂ D):** Decompose via strongly connected components, recurse (Line 26)

**Termination Guarantee:**

Each recursive call either:
- Returns $Q[A]$ (Line 18)
- Returns FAIL (Line 20)
- Recurses on smaller consolidated district (Line 26)

Since sets are finite and strictly decreasing, recursion must terminate.

---

#### 4. Assumptions

*From Proposition 9.8, Point 2 (page 8) - Sub-ioSCM Preservation:*

For strongly connected component $S \subseteq V$, consolidated district $D = \text{Cd}^G(S)$, and $P = \text{Pa}^G(D) \setminus D$:

$$P_M(S | \text{Pred}^G_<(S) \cap V, do(\mathbf{J})) = P_{M[D]}(S | \text{Pred}^{G[D]}_<(S) \cap D, do(P))$$

**Key Idea:**
- In order to identify a strongly connected component $S$ within a consolidated district $D$, you can work entirely within district $D$ as long as you treat external influences as interventions.

**This ensures:**
- Working within consolidated district $D$ (via sub-ioSCM $M[D]$) preserves the causal effects
- External influences on $D$ are captured through $P = \text{Pa}^G(D) \setminus D$
- Justifies solving identification problem within district context with distribution $Q[D]$

*From Lemma 9.7 (page 8) - Ancestral Sufficiency:*

For ancestral set $A \subseteq V$ where $\text{Anc}^G(A) = A$:

$$P_{M[A]}(A \cap V | do(A \cap \mathbf{J})) = P_M(A \cap V | do(\mathbf{J} \cup W))$$

for any $W \subseteq V \setminus (A \cap V)$ containing $(\text{Pa}^G(A) \cap V) \setminus (A \cap V)$.

**Key Idea:**
- The causal effect on an ancestral set $A$ from interventions $J$ in the full model equals the effect in the restricted model $M[A]$ with additional interventions on external parents $W$.

**This ensures:**
- Restricting to ancestral sets (used in Line 15) preserves identifiability
- Justifies marginalization and recursion (Lines 16 and 26)
- Variables outside ancestral set are causally irrelevant

*From Remark 9.11 (page 9) - Convergence:*

The algorithm alternates between taking ancestral closures and consolidated districts until reaching a set $A$ where:
- $A$ is ancestral: $\text{Anc}^{G_A}(A) = A$
- $A$ forms single consolidated district: $\text{CD}(G_A) = \{A\}$

**What this ensures:**
- Recursion terminates in finite steps
- Each recursive call works on strictly smaller problem or returns
- No infinite loops possible

*From Remark 9.11 (page 9) - No Completeness Claim:*

The paper makes no claim about completeness of the ID algorithm.

**What this means:**
- FAIL does not necessarily imply non-identifiability
- FAIL means "this algorithm cannot identify using these techniques"
- Effect may be identifiable by other methods

*From Definition 9.2 (page 8) - Apt-Order Existence:*

For any DMG $G$, there exists an apt-order $<$ (assembling pseudo-topological order).

**What this ensures:**
- The notation $\text{Pred}^{G[D]}_<(C)$ is well-defined (well-defined meaning an apt-order exists for any DMG, and given that ordering, we can compute which nodes come before $C$)
- Valid ordering exists for decomposing the problem
- Required for constructing identification formulas


## Line 14 - Precondition Check for IDCD

$$
\textbf{14: require: } C \subseteq D \subseteq V, \text{CD}(G_D) = \{D\}
$$

---

#### 1. Symbols

| Symbol | Meaning |
|--------|---------|
| $\textbf{require}$ | Precondition enforcement (hard constraint) |
| $C$ | Target variable set (from Line 13) |
| $\subseteq$ | Subset relation (or equal) |
| $D$ | Consolidated district (from Line 13) |
| $V$ | All observed variables in the graph |
| $\text{CD}(G_D)$ | Set of consolidated districts in induced subgraph $G_D$ |
| $G_D$ | Induced subgraph on district $D$ |
| $\{D\}$ | Singleton set containing only $D$ |

---

#### 2. Definitions

**Two Preconditions:**

**Precondition 1:** $C \subseteq D \subseteq V$ (Nested subset constraint)
- $C \subseteq D$: Target variables must be contained in the district
- $D \subseteq V$: District must be contained in all observed variables
- By transitivity: $C \subseteq V$

**Precondition 2:** $\text{CD}(G_D) = \{D\}$ (Single consolidated district constraint)
- $G_D = G[D]$: Induced subgraph on district $D$ (Definition 5.1, page 5)
- $\text{CD}(G_D)$: Set of all consolidated districts in $G_D$
- Must equal $\{D\}$: The entire set $D$ forms exactly one consolidated district
- No further decomposition into smaller districts is possible

**Enforcement:**

These are **hard constraints** - the function assumes they hold. If violated:
- Algorithm behavior is undefined
- Caller responsibility to ensure constraints are met (enforced at Line 5)

**Induced Subgraph $G_D$ (Definition 5.1, page 5):**

$$G_D = G[D] = (D, E^{\rightarrow}_D, E^{\leftrightarrow}_D)$$

where edges are restricted to those with both endpoints in $D$.

**Consolidated Districts (Definition 9.1, page 8):**

$$\text{CD}(G_D) = \{\text{Cd}^{G_D}(v) : v \in D\}$$

- The set of all distinct consolidated districts in $G_D$.
---

#### **Symbol Assumptions:**

*Subset Relations:*
- Subset operation $\subseteq$ is well-defined for finite sets
- Equality $\text{CD}(G_D) = \{D\}$ is well-defined
- Transitivity holds: if $C \subseteq D$ and $D \subseteq V$, then $C \subseteq V$

*Consolidated District Computation:*
- $\text{CD}(G_D)$ is computable for induced subgraph $G_D$
- Result is a set of frozensets (each district is a frozenset of variables)
- Computation terminates in finite time

*Enforcement:*
- Preconditions checked before IDCD execution begins
- If violated, function should not proceed (undefined behavior)
- Caller (Line 5) ensures constraints are satisfied
---

#### 3. English Explanation

This line specifies the **preconditions** that must hold before IDCD can execute.

**Precondition 1: Nested Subsets ($C \subseteq D \subseteq V$)**

**What it means:**
- Target variables $C$ must be inside district $D$
- District $D$ must be inside the full variable set $V$
- Creates a containment hierarchy: $C$ ⊆ $D$ ⊆ $V$

**Why this is required:**
- IDCD identifies effects on $C$ **within the context of district $D$**
- If $C \not\subseteq D$, we're asking about variables outside the district (invalid)
- If $D \not\subseteq V$, the district contains non-existent variables (invalid)


**Precondition 2: Single Consolidated District ($\text{CD}(G_D) = \{D\}$)**

**What it means:**
- When we look at the induced subgraph $G_D$ (graph restricted to district $D$)
- All variables in $D$ form **exactly one** consolidated district
- The district cannot be broken into smaller independent pieces

**Why this is required:**
- IDCD is designed to handle **one consolidated district at a time**
- If $D$ could be split into multiple districts, those should be handled separately (in the main loop, Lines 4-9)
- This constraint ensures $D$ is "atomic" - coupled by cycles or confounders, cannot decompose further


**When These Are Checked:**

**At Line 5 (initial call):**
- Main ID loop ensures $C \in \text{CD}(H)$ (so $C$ is a district)
- Passes $D = \text{Cd}^G(C) = C$ (district containing $C$, which equals $C$)
- Therefore: $C = D$, so $C \subseteq D \subseteq V$ holds ✓
- By construction, $\text{CD}(G_D) = \{D\}$ holds ✓

**At Line 20 (recursive call):**
- $C$ becomes $H$ (ancestral set, may be smaller)
- $D$ stays the same (original district)
- May have $C \subsetneq D$ (proper subset)
- But $\text{CD}(G_D) = \{D\}$ still holds (district structure unchanged)
---

#### 4. Assumptions

*From Algorithm Construction:*

These preconditions are **guaranteed by construction** when IDCD is called correctly:

**At Line 5 (initial call):**
- $C \in \text{CD}(H)$ ensures $C$ is a consolidated district from the main loop
- $D = \text{Cd}^G(C) = C$ by Line 4's loop structure
- Therefore $C = D \subseteq H \subseteq V$, satisfying $C \subseteq D \subseteq V$ 

**At Line 20 (recursive call):**
- $C$ becomes $H = \text{Anc}^{G_D}(C)$ from Line 15
- $H \subseteq D$ because ancestors are computed within $G_D$
- $D$ remains unchanged, so $D \subseteq V$ still holds
- Therefore $C \subseteq D \subseteq V$ 

*From Definition 9.1 (page 8) - Consolidated Districts:*

The constraint $\text{CD}(G_D) = \{D\}$ means $D$ is **maximally coupled** - all variables in $D$ are connected via:
- Strongly connected components (cycles), OR
- Bidirected edges (unmeasured confounders)

This ensures $D$ cannot be decomposed into independent sub-problems.
