# PHS564 — Lecture 02 (Student)
## Causal effects in ideal randomized trials

### Learning goals
- Explain how randomization gives exchangeability; contrast unconditional vs conditional randomization.
- Compute causal effects under randomization using observed outcomes.
- Understand positivity and why it is automatic in well-designed trials.

### Required reading
- Hernán & Robins, Chapter 2. https://miguelhernan.org/whatifbook


In [None]:
# Colab bootstrap (run this first if you opened from a Colab badge)
# - Clones the repo into /content/PHS564 (if needed)
# - Installs requirements
# - Adds repo to sys.path

from __future__ import annotations

import os
import sys
import subprocess
from pathlib import Path


def _in_colab() -> bool:
    return "google.colab" in sys.modules


if _in_colab():
    REPO_URL = "https://github.com/vafaei-ar/PHS564.git"
    TARGET_DIR = Path("/content/PHS564")

    if not (TARGET_DIR / "requirements.txt").exists():
        print("Cloning course repo into Colab runtime...")
        subprocess.run(["git", "clone", "--depth", "1", REPO_URL, str(TARGET_DIR)], check=True)

    os.chdir(TARGET_DIR)

    print("Installing requirements...")
    subprocess.run([sys.executable, "-m", "pip", "-q", "install", "-r", "requirements.txt"], check=True)

    if str(TARGET_DIR) not in sys.path:
        sys.path.insert(0, str(TARGET_DIR))

    print("✓ Colab setup complete. Now run the rest of the notebook.")
else:
    print("Not running in Colab; skipping Colab bootstrap.")


### Setup

This notebook is designed to run **locally** or in **Google Colab**.

**Colab workflow (recommended):**
1) Clone the course repo (ask the instructor for the GitHub URL).
2) Install requirements.
3) Run the notebook top-to-bottom.

> If you opened this notebook directly from GitHub in Colab (without cloning),
> relative paths will not work. Clone first.


In [None]:
from __future__ import annotations

import sys
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Reproducibility
RNG = np.random.default_rng(564)

# Locate repo root (works when running from lectures/Lxx.../student or /instructor)
THIS_DIR = Path.cwd()
REPO_ROOT = THIS_DIR
for _ in range(4):
    if (REPO_ROOT / "requirements.txt").exists() or (REPO_ROOT / "README.md").exists():
        break
    REPO_ROOT = REPO_ROOT.parent

DATA_DIR = REPO_ROOT / "data"
RAW_DIR = DATA_DIR / "raw"
PROC_DIR = DATA_DIR / "processed"

print("Working directory:", THIS_DIR)
print("Repo root:", REPO_ROOT)
print("Processed data dir exists:", PROC_DIR.exists())


## Part A — From a 2×2 table to causal estimands

We will compute risk, risk difference (RD), risk ratio (RR), and odds ratio (OR) from an **ideal randomized trial**.


In [None]:
# Example RCT results (toy numbers)
# A=1 treatment, A=0 control; Y=1 event
table = pd.DataFrame(
    {"Y=1":[18, 28], "Y=0":[182, 172]},
    index=pd.Index(["A=1","A=0"], name="Group")
)
table

### TODO A1 — Compute RD, RR, and OR
Fill in the function below and return a small dict with keys `rd`, `rr`, `or`.


In [None]:
def effect_measures_from_table(tab: pd.DataFrame) -> dict:
    # tab index: A=1 first row, A=0 second row; columns: Y=1, Y=0
    a1_y1 = tab.loc["A=1","Y=1"]
    a1_y0 = tab.loc["A=1","Y=0"]
    a0_y1 = tab.loc["A=0","Y=1"]
    a0_y0 = tab.loc["A=0","Y=0"]

    # TODO: compute risks
    risk1 = None
    risk0 = None

    # TODO: compute RD, RR, OR
    rd = None
    rr = None
    or_ = None

    return {"risk1": risk1, "risk0": risk0, "rd": rd, "rr": rr, "or": or_}

effects = effect_measures_from_table(table)
effects

## Part B — Randomization ≈ exchangeability in expectation
We simulate randomization to show covariate balance and sampling variability.


In [None]:
n = 400
L = RNG.normal(size=n)  # baseline prognostic factor
A = RNG.integers(0, 2, size=n)  # simple 1:1 randomization

# Outcome depends on A and L (treatment works on risk scale)
linpred = -1.8 + 0.6*A + 0.8*L
p = 1/(1+np.exp(-linpred))
Y = RNG.binomial(1, p)

df = pd.DataFrame({"L":L, "A":A, "Y":Y})
df.head()

### TODO B1 — Check balance
Compute mean(L) by treatment arm and interpret.


In [None]:
balance = df.groupby("A")["L"].mean()
balance

### TODO B2 — Estimate the ITT effect (RD)
Compute E[Y|A=1] - E[Y|A=0].


In [None]:
# TODO
itt_rd = None
itt_rd

## Part C — Bootstrap CI for RD (optional)


In [None]:
def bootstrap_rd(data: pd.DataFrame, B: int = 500) -> tuple[float,float,float]:
    rds = []
    n = len(data)
    for _ in range(B):
        samp = data.sample(n=n, replace=True, random_state=None)
        rd = samp.loc[samp["A"]==1,"Y"].mean() - samp.loc[samp["A"]==0,"Y"].mean()
        rds.append(rd)
    rds = np.array(rds)
    return float(np.mean(rds)), float(np.quantile(rds, 0.025)), float(np.quantile(rds, 0.975))

bootstrap_rd(df, B=300)

## Reflection questions
1) Why does randomization justify a causal interpretation of ITT?
2) What changes if noncompliance exists?
