# Observational Data
## From Observational Data to Conditionally Randomized Experiments

Causal inference from observational data relies on the idea that, under the assumptions of the **Rubin Causal Model** (Rubin, 1978), an observational study can be regarded as a *conditionally randomized experiment*. 

Under the assumptions of ignorability (see below), the observed data represent the essential features of a randomized experiment, enabling the identification and consistent estimation of causal effects within the **potential outcomes framework**.


## Potential Outcomes Framework
- (Neyman, 1923; Rubin, 1974 and 1978): Rubin extended Neyman’s theory for randomized experiments to observational studies.
- Specifically, when treatment assignment is **strongly ignorable** given a set of observed covariates — that is, when the following two conditions hold:

$$
(Y(1), Y(0)) \perp T \mid X
$$

$$
0 < P(T = 1 \mid X) < 1,
$$

then conditioning on $X$ renders the treatment assignment mechanism analogous to that of a randomized controlled trial. More explanation in [assumptions guide](https://www.uniqcret.com/post/causal-inference-assumptions-guide.)

### Assumptions
#### **Consistency**: 

$$
Y(A) = Y(a) = Y \quad \text{when } A = a
$$

links the potential outcomes to the observed outcomes by requiring that the two are equal under the same treatment assignments. This ties the potential outcome $Y(a)$ to the factual outcome $Y$ when the treatment actually received is $A = a$. 

#### **Stable Unit Treatment Value Assumption (SUTVA)**: 

SUTVA combines consistency and no interference. The **no interference** assumption states that:

$$
Y_i(a_i) \text{ depends only on } a_i \text{ (not on } a_j \text{ for } j \neq i)
$$

implies that there is no interference between treatment assignment and outcomes across patients. In the assumption of consistency, we are implicitly making the assumption of no interference; that is, whether one individual receives treatment (or not) has no effect on the potential outcomes of any other individual. This is encapsulated by the usual statistical ‘i.i.d.’ assumption, but it can easily be violated in a study of the effect of a vaccine or if a treatment is assigned at a group level.

#### **Positivity**: 

$$
0 < P(T = 1 \mid X) < 1 \quad \text{with probability } 1
$$

or more strictly (strict positivity):

$$
\varepsilon < P(T = 1 \mid X) < 1 - \varepsilon \quad \text{with probability } 1, \text{ for some } \varepsilon > 0
$$

This states that, for any possible patient characteristic, treatment assignment is not deterministic. For all but a measure zero subset of the population, the probability of receiving treatment **and** of receiving control is non-zero.

#### **Exchangeability (No unobserved confounding)**: 

$$
Y(a) \perp\!\!\!\perp A \mid X \quad \text{for all } a \in \mathcal{A}
$$

or equivalently:

$$
(Y(1), Y(0)) \perp\!\!\!\perp A \mid X
$$

This implies that conditioning on the patient characteristics (covariates $X$) is sufficient to remove confounding bias in estimated HTEs. This is also called *conditional exchangeability*, *conditional ignorability*, or *causal sufficiency*.

## Treatment Effect Estimation
### Individual Treatment Effect
For each unit $ i $ (say, a person):

- $ T_i $: treatment indicator (1 if treated, 0 if control)
- $ Y_i(1) $: potential outcome **if treated**
- $ Y_i(0) $: potential outcome **if not treated**

The **causal effect** (ITE: individual treatment effect) for unit $i$ is:

$$
\tau_i = Y_i(1) - Y_i(0)
$$

### Fundamental Problem of Causal Inference
However, we can only **observe one** of these outcomes — the one corresponding to the actual treatment received. This is known as the **fundamental problem of causal inference**: We never observe both potential outcomes for the same unit. 

Only one of the outcomes is observed for each unit: either the outcome if treated $Y(T = 1)$ or the outcome if untreated $Y(T = 0)$. Individual causal effects cannot be expressed as a function of the observed data because of missing data. Identifying individual causal effects is generally impossible. Nonetheless, we aim to identify the average causal effect in a population of interest.

### Average Treatment Effect (ATE)

The **Average Treatment Effect** (ATE) compares the average response if everyone were assigned to receive treatment versus if everyone were assigned to receive control:

$$
\mathrm{ATE} = \mathbb{E}[Y(1)] - \mathbb{E}[Y(0)]
$$

In do-notation, this corresponds to:

$$
\mathrm{ATE} = \mathbb{E}[Y \mid do(A=1)] - \mathbb{E}[Y \mid do(A=0)]
$$

The ATE represents the average causal effect in the entire population of interest.

### Average Treatment Effect on the Treated (ATT)

The **Average Treatment Effect on the Treated** (ATT) considers the effect only for those who actually received treatment:

$$
\mathrm{ATT} = \mathbb{E}[Y(1) \mid A=1] - \mathbb{E}[Y(0) \mid A=1]
$$

Similarly, we can define the **Average Treatment Effect on the Controls** (ATC) by conditioning on $A=0$ instead. The ATT is particularly useful when we want to understand the effect for those who actually received the treatment, which may be more relevant for policy decisions.

### Conditional Average Treatment Effect (CATE)

The **Conditional Average Treatment Effect** (CATE) allows us to identify **heterogeneous treatment effects** (HTE) by conditioning on covariates:

$$
\mathrm{CATE}(x) = \mathbb{E}[Y(1) \mid X=x] - \mathbb{E}[Y(0) \mid X=x]
$$

for some collection of covariates $X$. 

A heterogeneous treatment effect is present if there exist two values $x, x'$ such that $\mathrm{CATE}(x) \neq \mathrm{CATE}(x')$. This enables us to understand how treatment effects vary across different subpopulations defined by the covariates.

**Reference**: [Causal Effects - Oxford APTS](https://www.stats.ox.ac.uk/~evans/APTS/ce.html)