# PHS564 — Lecture 05 (Student)
## Causal diagrams (DAGs), d-separation, and collider bias

### Learning goals
- Read a DAG and identify:
- causal paths, backdoor paths
- colliders and descendants of colliders
- Use d-separation to decide which paths are open/blocked under conditioning.
- Translate “adjustment set” logic into an analysis plan.

### Required reading
- Hernán & Robins, Chapter 6 (DAGs). https://miguelhernan.org/whatifbook


In [None]:
# Colab bootstrap (run this first if you opened from a Colab badge)
# - Clones the repo into /content/PHS564 (if needed)
# - Installs requirements
# - Adds repo to sys.path

from __future__ import annotations

import os
import sys
import subprocess
from pathlib import Path


def _in_colab() -> bool:
    return "google.colab" in sys.modules


if _in_colab():
    REPO_URL = "https://github.com/vafaei-ar/PHS564.git"
    TARGET_DIR = Path("/content/PHS564")

    if not (TARGET_DIR / "requirements.txt").exists():
        print("Cloning course repo into Colab runtime...")
        subprocess.run(["git", "clone", "--depth", "1", REPO_URL, str(TARGET_DIR)], check=True)

    os.chdir(TARGET_DIR)

    print("Installing requirements...")
    subprocess.run([sys.executable, "-m", "pip", "-q", "install", "-r", "requirements.txt"], check=True)

    if str(TARGET_DIR) not in sys.path:
        sys.path.insert(0, str(TARGET_DIR))

    print("✓ Colab setup complete. Now run the rest of the notebook.")
else:
    print("Not running in Colab; skipping Colab bootstrap.")


### Setup

This notebook is designed to run **locally** or in **Google Colab**.

**Colab workflow (recommended):**
1) Clone the course repo (ask the instructor for the GitHub URL).
2) Install requirements.
3) Run the notebook top-to-bottom.

> If you opened this notebook directly from GitHub in Colab (without cloning),
> relative paths will not work. Clone first.


In [None]:
from __future__ import annotations

import sys
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Reproducibility
RNG = np.random.default_rng(564)

# Locate repo root (works when running from lectures/Lxx.../student or /instructor)
THIS_DIR = Path.cwd()
REPO_ROOT = THIS_DIR
for _ in range(4):
    if (REPO_ROOT / "requirements.txt").exists() or (REPO_ROOT / "README.md").exists():
        break
    REPO_ROOT = REPO_ROOT.parent

DATA_DIR = REPO_ROOT / "data"
RAW_DIR = DATA_DIR / "raw"
PROC_DIR = DATA_DIR / "processed"

print("Working directory:", THIS_DIR)
print("Repo root:", REPO_ROOT)
print("Processed data dir exists:", PROC_DIR.exists())


## Part A — Draw a DAG in Python
We represent a DAG as edges and draw it. If `graphviz` is unavailable, you can skip the drawing and focus on simulation.


In [None]:
import networkx as nx
try:
    from graphviz import Digraph
    HAS_GRAPHVIZ = True
except Exception as e:
    HAS_GRAPHVIZ = False
    print("graphviz not available:", e)

edges = [("L","A"), ("L","Y"), ("A","Y")]  # classic confounding DAG
G = nx.DiGraph(edges)
list(G.edges())

In [None]:
if HAS_GRAPHVIZ:
    dot = Digraph()
    for u,v in edges:
        dot.edge(u,v)
    dot

### TODO A1 — Identify a valid adjustment set
In the DAG above, what should we condition on to identify the causal effect of A on Y?

**Your answer (TODO):**


## Part B — Collider bias by simulation
DAG: A → C ← Y. Conditioning on C induces an association between A and Y even if there is no causal effect.


In [None]:
n = 20000
A = RNG.binomial(1, 0.5, size=n)
Y = RNG.binomial(1, 0.5, size=n)  # independent of A
# Collider depends on both A and Y
pC = 1/(1+np.exp(-(-2.0 + 2.0*A + 2.0*Y)))
C = RNG.binomial(1, pC, size=n)
df = pd.DataFrame({"A":A,"Y":Y,"C":C})

# Crude association (should be ~0)
crude = df.groupby("A")["Y"].mean().diff().iloc[-1]
crude

### TODO B1 — Conditional association given the collider
Compute RD of Y comparing A=1 vs A=0 **within C=1**.


In [None]:
d = df[df["C"]==1]
rd_c1 = None  # TODO
rd_c1

## Reflection
1) Explain collider bias in one sentence.
2) How can selection into a dataset act like conditioning on a collider?
