# Lecture 01: Counterfactuals and Causal Effects

[!["Open In Colab"](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/<ORG>/<REPO>/blob/main/lectures/L01_Counterfactuals/L01_Counterfactuals_student.ipynb)

## Learning Objectives
1. Define **potential outcomes** (counterfactuals).
2. Understand why individual causal effects are unobservable (The Fundamental Problem).
3. Distinguish between **association** and **causation**.
4. See how confounding masks the true causal effect.

### 1. Setup
We will load our helper functions and the dataset for today's lecture.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from phs564_ci.datasets import load_data
from phs564_ci.plotting.causal_plots import plot_potential_outcomes, plot_observed_association

# Load the synthetic dataset
df = load_data("l01_potential_outcomes.csv")
df.head()

--- 
## ðŸ›‘ Activity 1: Write your own causal question (Slide 07)

Working with your partner, draft a causal question related to your research field. 

**Question Checklist:**
1. **Population:** Who are we studying?
2. **Intervention (A=1 vs A=0):** Is it well-defined?
3. **Outcome (Y):** What is the measure of success?
4. **Time Zero (T0):** When does follow-up start?

**Write your draft below:**

--- 
### 2. The Potential Outcomes (The God's Eye View)
In this simulation, we know both potential outcomes:
- `Y_a0`: What would have happened if the person was NOT treated.
- `Y_a1`: What would have happened if the person WAS treated.

**Note:** In real life, we only see `Y` (the observed outcome).

In [None]:
plot_potential_outcomes(df)

### 3. Calculating the True Causal Effect
Since we have the counterfactuals, we can calculate the true Average Causal Effect (ACE).

In [None]:
true_ace = df['Y_a1'].mean() - df['Y_a0'].mean()
print(f"The True Average Causal Effect is: {true_ace:.3f}")

### 4. Association vs. Causation
Now let's look at what we actually observe in the data. `A` is the treatment assignment, and `Y` is the outcome we see.

In [None]:
plot_observed_association(df)

mean_treated = df[df['A'] == 1]['Y'].mean()
mean_untreated = df[df['A'] == 0]['Y'].mean()
observed_diff = mean_treated - mean_untreated

print(f"Observed Difference (Association): {observed_diff:.3f}")
print(f"True Causal Effect: {true_ace:.3f}")

--- 
## ðŸ›‘ Activity 2: List candidate confounders (Slide 31)

For the causal question you wrote in Activity 1, list all variables that might affect **both** your treatment and your outcome. 

**Constraint:** These must be variables that exist *before* the treatment is assigned.

**List your candidate confounders (L) here:**

--- 
## ðŸ›‘ Activity 3: Concept check (Slide 38)

**Question 1:** If a subject is treated ($A=1$), what is their observed outcome $Y$ in terms of potential outcomes?

**Question 2:** Can we ever observe the individual causal effect $(Y_i^1 - Y_i^0)$ for a single person? Why or why not?

**Question 3:** If the Average Treatment Effect (ATE) is 0, does that mean the treatment had no effect on *anyone* in the population?

### 5. Summary
- Association is calculated from what we see.
- Causation is the difference between counterfactual worlds.
- Identification requires assumptions to link the two.