# PHS564 — Lecture 12 (Student)
## Capstone: target trial emulation workshop + project presentations (MIMIC-IV Demo or full MIMIC-IV optional)

### Learning goals
- Convert an informal clinical question into a **complete target trial protocol**:
- Stress-test protocol choices for bias: time zero, censoring, competing risks (brief), positivity.
- Produce a “protocol v2” that is executable as code + tables/figures.

### Required reading
(see course plan)


### Setup

This notebook is designed to run **locally** or in **Google Colab**.

**Colab workflow (recommended):**
1) Clone the course repo (ask the instructor for the GitHub URL).
2) Install requirements.
3) Run the notebook top-to-bottom.

> If you opened this notebook directly from GitHub in Colab (without cloning),
> relative paths will not work. Clone first.


In [None]:
from __future__ import annotations

import sys
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Reproducibility
RNG = np.random.default_rng(564)

# Locate repo root (works when running from lectures/Lxx.../student or /instructor)
THIS_DIR = Path.cwd()
REPO_ROOT = THIS_DIR
for _ in range(4):
    if (REPO_ROOT / "requirements.txt").exists() or (REPO_ROOT / "README.md").exists():
        break
    REPO_ROOT = REPO_ROOT.parent

DATA_DIR = REPO_ROOT / "data"
RAW_DIR = DATA_DIR / "raw"
PROC_DIR = DATA_DIR / "processed"

print("Working directory:", THIS_DIR)
print("Repo root:", REPO_ROOT)
print("Processed data dir exists:", PROC_DIR.exists())


## Target trial protocol template (team worksheet)

Fill this in **as a 2-person team**. This notebook is a living document for your capstone.

### 1) Causal question (informal)
- **Population:**  
- **Intervention / strategy A=1:**  
- **Comparator / strategy A=0:**  
- **Outcome:**  
- **Time zero (index):**  
- **Estimand:** (ATE? ATT? risk difference at 28 days? etc.)

### 2) Target trial specification (table)

| Component | Your choice (precise, implementable) |
|---|---|
| Eligibility criteria | |
| Treatment strategies | |
| Assignment procedure | (observational analogue; how you emulate randomization) |
| Follow-up (start/end) | |
| Outcome definition | |
| Causal contrast | (intention-to-treat vs per-protocol analogue) |
| Analysis plan | (estimator: g-formula / IPW / MSM / doubly robust; diagnostics) |

### 3) DAG + confounder set
- Draw a DAG for your question (baseline + time-varying, if needed).
- List the minimal adjustment set(s) you will use and justify clinically.

### 4) Data mapping (MIMIC variables)
- Map each trial component to MIMIC variables / tables.
- Define code lists / thresholds (ICD, labs, vitals) where needed.

### 5) Threat model (bias audit)
- Exchangeability: what's missing / unmeasured?
- Positivity: where do treatment strategies become unrealistic?
- Consistency: multiple versions of treatment?
- Measurement error: outcome/treatment misclassification?
- Selection bias: censoring / discharge / loss to follow-up?

### 6) Deliverables checklist (what you will submit)
- Target trial table (filled)
- 1-page methods memo (PDF)
- Reproducible notebook(s) that run on Demo data
- Figures: DAG, weight diagnostics (if IPW/MSM), effect estimates with CI


## Data exploration scaffold (optional)
If your team uses an instructor-provided extract, load it here.


In [None]:
parquet_path = PROC_DIR / "cohort_L12_capstone.parquet"
csv_path = PROC_DIR / "cohort_L12_capstone.csv"

if parquet_path.exists():
    cap = pd.read_parquet(parquet_path)
elif csv_path.exists():
    cap = pd.read_csv(csv_path)
else:
    cap = None
    print("No L12 capstone extract found yet (this is OK).")

if cap is not None:
    print("n =", len(cap))
    cap.head()


## Checklist before you submit
- [ ] Time zero is unambiguous.
- [ ] Eligibility criteria are implementable.
- [ ] Confounder list justified clinically.
- [ ] Diagnostics included (positivity, weights).
- [ ] You can explain the estimator in plain language.
