# PHS564 — Lecture 04 (Student)
## Effect modification and effect-measure modification

### Learning goals
- Define effect modification (causal effect differs across strata) vs effect-measure modification (depends on scale).
- Compute stratum-specific causal effects and interpret them.
- Discuss pooling vs not pooling; link to transportability.

### Required reading
- Hernán & Robins, Chapter 4. https://miguelhernan.org/whatifbook


In [None]:
# Colab bootstrap (run this first if you opened from a Colab badge)
# - Clones the repo into /content/PHS564 (if needed)
# - Installs requirements
# - Adds repo to sys.path

from __future__ import annotations

import os
import sys
import subprocess
from pathlib import Path


def _in_colab() -> bool:
    return "google.colab" in sys.modules


if _in_colab():
    REPO_URL = "https://github.com/vafaei-ar/PHS564.git"
    TARGET_DIR = Path("/content/PHS564")

    if not (TARGET_DIR / "requirements.txt").exists():
        print("Cloning course repo into Colab runtime...")
        subprocess.run(["git", "clone", "--depth", "1", REPO_URL, str(TARGET_DIR)], check=True)

    os.chdir(TARGET_DIR)

    print("Installing requirements...")
    subprocess.run([sys.executable, "-m", "pip", "-q", "install", "-r", "requirements.txt"], check=True)

    if str(TARGET_DIR) not in sys.path:
        sys.path.insert(0, str(TARGET_DIR))

    print("✓ Colab setup complete. Now run the rest of the notebook.")
else:
    print("Not running in Colab; skipping Colab bootstrap.")


### Setup

This notebook is designed to run **locally** or in **Google Colab**.

**Colab workflow (recommended):**
1) Clone the course repo (ask the instructor for the GitHub URL).
2) Install requirements.
3) Run the notebook top-to-bottom.

> If you opened this notebook directly from GitHub in Colab (without cloning),
> relative paths will not work. Clone first.


In [None]:
from __future__ import annotations

import sys
from pathlib import Path

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Reproducibility
RNG = np.random.default_rng(564)

# Locate repo root (works when running from lectures/Lxx.../student or /instructor)
THIS_DIR = Path.cwd()
REPO_ROOT = THIS_DIR
for _ in range(4):
    if (REPO_ROOT / "requirements.txt").exists() or (REPO_ROOT / "README.md").exists():
        break
    REPO_ROOT = REPO_ROOT.parent

DATA_DIR = REPO_ROOT / "data"
RAW_DIR = DATA_DIR / "raw"
PROC_DIR = DATA_DIR / "processed"

print("Working directory:", THIS_DIR)
print("Repo root:", REPO_ROOT)
print("Processed data dir exists:", PROC_DIR.exists())


## Part A — Simulate effect modification
We create a binary effect modifier `M` that changes the treatment effect.


In [None]:
# Statsmodels for regression (logit/ols); installed via requirements.txt
import statsmodels.api as sm
import statsmodels.formula.api as smf
n = 4000
M = RNG.integers(0,2,size=n)          # effect modifier
L = RNG.normal(size=n)                # prognostic factor (not modifier)
A = RNG.binomial(1, 0.5, size=n)      # randomized for clarity

# True risks differ by M on additive scale
# logit risk with interaction
linpred = -2.0 + 1.0*A + 1.0*M + 1.2*(A*M) + 0.6*L
pY = 1/(1+np.exp(-linpred))
Y = RNG.binomial(1, pY)
df = pd.DataFrame({"M":M,"L":L,"A":A,"Y":Y})
df.head()

### TODO A1 — Compute stratum-specific risks and RD/RR
Compute within M=0 and M=1.


In [None]:
def measures_by_stratum(data: pd.DataFrame, stratum_value: int) -> dict:
    d = data[data["M"]==stratum_value]
    r1 = d.loc[d["A"]==1,"Y"].mean()
    r0 = d.loc[d["A"]==0,"Y"].mean()
    rd = r1 - r0
    rr = r1 / r0
    return {"M":stratum_value,"risk1":r1,"risk0":r0,"rd":rd,"rr":rr}

out = pd.DataFrame([measures_by_stratum(df,0), measures_by_stratum(df,1)])
out

### TODO A2 — Interaction in a regression model
Fit `Y ~ A + M + A:M` (logit) and interpret the interaction term.


In [None]:
model = smf.logit("Y ~ A + M + A:M", data=df).fit(disp=False)
model.params

## Part B — Scale matters
We illustrate that effect modification can look different on RD vs RR.


In [None]:
# Plot stratum-specific risks under A=0,1
risk_plot = []
for m in [0,1]:
    d = df[df["M"]==m]
    for a in [0,1]:
        risk_plot.append({"M":m,"A":a,"risk":d.loc[d["A"]==a,"Y"].mean()})
risk_plot = pd.DataFrame(risk_plot)
risk_plot_pivot = risk_plot.pivot(index="M", columns="A", values="risk")
risk_plot_pivot

In [None]:
plt.figure()
for a in [0,1]:
    plt.plot(risk_plot_pivot.index, risk_plot_pivot[a], marker="o", label=f"A={a}")
plt.xticks([0,1])
plt.xlabel("Effect modifier M")
plt.ylabel("Risk Pr(Y=1)")
plt.legend()
plt.title("Stratum-specific risks")
plt.show()

## Reflection
1) Give a clinical example where you expect effect modification.
2) Why do we care about the effect measure (RD vs RR) when discussing modification?
