# Lecture 02: Causal Effects in Ideal Randomized Trials

[!["Open In Colab"](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/<ORG>/<REPO>/blob/main/lectures/L02_Ideal_RCTs/L02_Ideal_RCTs_student.ipynb)

## Learning Objectives
1. Explain how **randomization** ensures exchangeability.
2. Calculate causal effects (Risk Difference/Ratio) in randomized trials.
3. Understand **stratified randomization** and how to adjust for it.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from phs564_ci.datasets import load_data

# Load the randomized trial data
df = load_data("l02_ideal_rct.csv")
df.head()

--- 
## üõë Activity 1: RCT Abstract Dissection (Slide 10)

**Identify the following from your assigned paper:**
1. **Estimand:** What is the primary causal effect they want to measure? (e.g., ITT effect of Drug X on mortality at 30 days).
2. **Estimator:** How did they calculate it? (e.g., Difference in proportions, Cox model hazard ratio).
3. **Target Population:** Who are the results intended for?

--- 
### 1. Simple Analysis
In an ideal RCT, the average causal effect is simply the difference in observed means.

In [None]:
risk_treated = df[df['A'] == 1]['Y'].mean()
risk_untreated = df[df['A'] == 0]['Y'].mean()

rd = risk_treated - risk_untreated
rr = risk_treated / risk_untreated

print(f"Risk Difference (RD): {rd:.3f}")
print(f"Risk Ratio (RR): {rr:.3f}")

### 2. Stratified Analysis
This trial was stratified by a covariate `L` (e.g., Sex).

In [None]:
# Group by L and A, then calculate mean of Y
stratified_results = df.groupby(['L', 'A'])['Y'].mean().unstack()
stratified_results['RD'] = stratified_results[1] - stratified_results[0]
stratified_results

### 3. Standardization (Adjustment)
To get a single marginal effect adjusted for the stratification variable.

In [None]:
# Calculate the distribution of L in the population
prob_l1 = df['L'].mean()
prob_l0 = 1 - prob_l1

# Standardized Risk = Sum(Risk in Stratum * Weight of Stratum)
std_risk_a1 = (stratified_results.loc[1, 1] * prob_l1) + (stratified_results.loc[0, 1] * prob_l0)
std_risk_a0 = (stratified_results.loc[1, 0] * prob_l1) + (stratified_results.loc[0, 0] * prob_l0)

print(f"Standardized RD: {std_risk_a1 - std_risk_a0:.3f}")

--- 
## üõë Activity 2: Design a mini RCT (Slide 19)

For the causal question you defined in Lecture 01, design a simple RCT protocol.

1. **Randomization Strategy:** Simple or Stratified?
2. **Blinding:** Who will be blinded?
3. **Adherence Plan:** How will you handle people who don't follow the protocol?

--- 
### üñºÔ∏è Figure Generation: RCT vs Observational (Slide 23)
Let's visualize how randomization balances covariates compared to an observational study.

In [None]:
# Load observational data from L03 for comparison
df_obs = load_data("l03_observational.csv")

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
sns.barplot(x='A', y='L', data=df)
plt.title("RCT: L by Treatment A")
plt.ylabel("Mean of Covariate L")

plt.subplot(1, 2, 2)
sns.barplot(x='A', y='L', data=df_obs)
plt.title("Observational: L by Treatment A")
plt.ylabel("Mean of Covariate L")

plt.tight_layout()
plt.savefig("figures/L02/sim_rct_vs_obs.png")
plt.show()

### 5. Summary
- Randomization creates exchangeability by design.
- ITT analysis is the primary way to maintain randomization.
- Adjustment can be used to improve precision.