# 4. `Conditional Probability`

1. Incomplete information and sub-sigma-algebras. 
2. Bayes’ formula as a ratio of measures. 
3. Prior probability and its update. 
4. Definition of independence.


# `4. Conditional Probability`

## `1. Incomplete information and sub-sigma-algebras`

### Motivation: “updating after we learn something”
When we learn that some event **B happened**, we **restrict** attention to the outcomes inside **B**, and then **renormalise** so that total probability becomes 1 again.


### **`Def. 2.2.1`: Conditional probability**
If $A$ and $B$ are events with $P(B)>0$, then the conditional probability of $A$ given $B$ is
$$
P(A\mid B)=\frac{P(A\cap B)}{P(B)}.
$$
The slides call:
- $P(A)$ the **prior probability** of $A$,
- $P(A\mid B)$ the **posterior probability** of $A$ (updated after observing $B$).


### Intuition 2.2.3 (Pebble world): “restrict + renormalise”
1) $B$ occurred, so only take pebbles from $B$  
2) Renormalise, so that total mass is still $1$  
So you get a new probability scale “inside $B$”.


### Incomplete information as a *partition* (what the slides use)
Often our information is not “exact outcome”, but “which *region* we are in”.
That is represented by a partition $A_1,\dots,A_n$ of the sample space:
- $A_i$ are disjoint
- $\bigcup_{i=1}^n A_i = S$ (the sample space)

This is the setup for the Law of Total Probability.

> Note: your topic list mentions **sub-$\sigma$-algebras**. In these slides, the “partial information” is handled mainly via conditioning on an event $B$ or on a partition $\{A_i\}$.

## `2. Bayes’ formula as a ratio of measures`

### The key identity behind Bayes
**`Thm. 2.3.1`: Probability of intersection**
For events $A,B$ with positive probabilities:
$$
P(A\cap B)=P(B)\,P(A\mid B)=P(A)\,P(B\mid A).
$$

### **`Thm. 2.3.3`: Bayes’ rule**
From the intersection identity:
$$
P(A\mid B)=\frac{P(B\mid A)\,P(A)}{P(B)}.
$$
Interpretation: posterior = (likelihood × prior) / evidence.



### **`Def. 2.3.4`: Odds**
For an event $A$, the **odds** are
$$
\text{odds}(A)=\frac{P(A)}{P(A^c)}.
$$
Example stated: if $P(A)=2/3$, then odds are $2:1$. Conversely:
$$
P(A)=\frac{\text{odds}(A)}{1+\text{odds}(A)}.
$$


### **`Thm. 2.3.5`: Odds form of Bayes’ rule**
For events $A,B$ with positive probabilities:
$$
\frac{P(A\mid B)}{P(A^c\mid B)}
=
\frac{P(B\mid A)}{P(B\mid A^c)}
\cdot
\frac{P(A)}{P(A^c)}.
$$
The factor
$$
\frac{P(B\mid A)}{P(B\mid A^c)}
$$
is called the **likelihood ratio** in the slides.

## `3. Prior probability and its update`

### Prior vs posterior (language used in the slides)
- **Prior**: $P(A)$ (before observing evidence)
- **Posterior**: $P(A\mid B)$ (after observing evidence $B$)

This is exactly what Bayes’ rule computes:
$$
P(A\mid B)=\frac{P(B\mid A)\,P(A)}{P(B)}.
$$


### **`Thm. 2.3.6`: Law of total probability (LOTP)**
If $A_1,\dots,A_n$ is a partition of $S$ and $P(A_i)>0$, then for any event $B$:
$$
P(B)=\sum_{i=1}^n P(B\mid A_i)\,P(A_i).
$$



### Example 2.3.9 (Testing for a disease): update can be surprising
Given:
- prevalence (prior): $P(D)=0.01$
- test sensitivity: $P(T\mid D)=0.95$
- true negative: $P(T^c\mid D^c)=0.95$ so $P(T\mid D^c)=0.05$

Then:
$$
P(D\mid T)
=
\frac{P(T\mid D)\,P(D)}
{P(T\mid D)\,P(D)+P(T\mid D^c)\,P(D^c)}
\approx 0.16.
$$
So even a “95% accurate” test can lead to a posterior of only ~16% when the prior is very small.


### Bayes with extra conditioning (updating *given* some background info)
**`Thm. 2.4.2`: Bayes’ rule with extra conditioning**
Provided $P(A\cap E)>0$ and $P(B\cap E)>0$:
$$
P(A\mid B,E)=\frac{P(B\mid A,E)\,P(A\mid E)}{P(B\mid E)}.
$$

(And the slides also give LOTP with extra conditioning using the same idea.)



## `4. Definition of independence`

### **`Def. 2.5.1`: Independence of two events**
Events $A$ and $B$ are **independent** if:
$$
P(A\cap B)=P(A)\,P(B).
$$
If $P(A)>0$ and $P(B)>0$, this is equivalent to:
$$
P(A\mid B)=P(A)
\quad\text{and also to}\quad
P(B\mid A)=P(B).
$$
Independence is symmetric: if $A$ is independent of $B$, then $B$ is independent of $A$.

### Important note (slides emphasize this)
Independence $\neq$ disjointness.  
If $A$ and $B$ are disjoint, then $P(A\cap B)=0$, so they can be independent only if
$$
P(A)=0 \quad \text{or} \quad P(B)=0.
$$

### Proposition 2.5.3 (closure under complements)
If $A$ and $B$ are independent, then the slides state that these pairs are also independent:
- $A$ and $B^c$
- $A^c$ and $B$
- $A^c$ and $B^c$

### **`Def. 2.5.4`: Independence of 3 events**
Events $A,B,C$ are independent if all of these hold:
$$
P(A\cap B)=P(A)P(B),\quad
P(A\cap C)=P(A)P(C),\quad
P(B\cap C)=P(B)P(C),
$$
and
$$
P(A\cap B\cap C)=P(A)P(B)P(C).
$$
If only the first three hold, the slides call that **pairwise independence**.


### Example 2.5.5 (pairwise independent but not independent)
Two coin tosses:
- $A$ = “1st is H”
- $B$ = “2nd is H”
- $C$ = “both tosses have same result”

Slides state they are pairwise independent, but:
$$
P(A\cap B\cap C)=\frac14
\quad \text{while} \quad
P(A)P(B)P(C)=\frac18,
$$
so they are **not** independent as a triple.



### **`Def. 2.5.7`: Conditional independence**
Events $A$ and $B$ are conditionally independent given $E$ if:
$$
P(A\cap B\mid E)=P(A\mid E)\,P(B\mid E).
$$
The slides warn:
- conditional independence does **not** imply independence
- independence does **not** imply conditional independence


# `4. Conditional Probability`  *(updated with the new slides)*

## `1. Incomplete information and sub-sigma-algebras`

### Motivation: “conditioning on what you know”
So far we usually conditioned on events (like $P(A\mid B)$). But in real situations we often have **partial information**: we can distinguish some events, but not others.

The slides formalize “what you know” as a **sub-$\sigma$-algebra**
$$
\mathcal G \subseteq \mathcal F,
$$
which contains exactly the events we can tell apart with our limited information.


### How do sub-$\sigma$-algebras come from observations?
- If we observe the actual value of a random variable $Y:\Omega\to\mathcal Y$, then the natural information is
$$
\mathcal G=\sigma(Y),
$$
the $\sigma$-algebra generated by $Y$.

- If we only observe whether an event $E$ happened (coarse information), then
$$
\mathcal G=\{\varnothing, E, E^c, \Omega\}.
$$

### What should “conditional probability given $\mathcal G$” look like?
Fix an event $A\in\mathcal F$. The slides state that “the updated probability” should be:

1) **Random**: it depends on what was observed, so it must be a random variable.
2) **Consistent**: averaging over all possible observations should recover the original probability (marginalisation).

So we want a **$\mathcal G$-measurable random variable** $Z(\omega)$ that represents “$P(A \mid \text{what we know})”.


## `2. Bayes’ formula as a ratio of measures`

### **`Def. 2.2.1`: Conditional probability (event version)**
If $A,B\in\mathcal F$ and $P(B)>0$:
$$
P(A\mid B)=\frac{P(A\cap B)}{P(B)}.
$$

### **`Thm. 2.3.1`: Probability of intersection**
For events $A,B$ with positive probabilities:
$$
P(A\cap B)=P(B)\,P(A\mid B)=P(A)\,P(B\mid A).
$$

### **`Thm. 2.3.3`: Bayes’ rule**
$$
P(A\mid B)=\frac{P(B\mid A)\,P(A)}{P(B)}.
$$ 

### **`Def. 2.3.4`: Odds**
$$
\text{odds}(A)=\frac{P(A)}{P(A^c)}.
$$

### **`Thm. 2.3.5`: Odds form of Bayes’ rule**
$$
\frac{P(A\mid B)}{P(A^c\mid B)}
=
\frac{P(B\mid A)}{P(B\mid A^c)}
\cdot
\frac{P(A)}{P(A^c)}.
$$
The factor
$$
\frac{P(B\mid A)}{P(B\mid A^c)}
$$
is the **likelihood ratio**.


## `3. Prior probability and its update`

### Prior vs posterior
- **Prior**: $P(A)$ (before observing evidence)
- **Posterior**: $P(A\mid B)$ (after observing evidence $B$)

Bayes performs the update:
$$
P(A\mid B)=\frac{P(B\mid A)\,P(A)}{P(B)}.
$$

### **`Thm. 2.3.6`: Law of total probability (LOTP)**
If $A_1,\dots,A_n$ is a partition of $S$ with $P(A_i)>0$, then for any $B$:
$$
P(B)=\sum_{i=1}^n P(B\mid A_i)\,P(A_i).
$$

### Example 2.3.9 (disease testing): small prior ⇒ surprising posterior
The slides show a case where even a “95% accurate” test can yield
a posterior around $0.16$ when the disease prevalence prior is $0.01$.


### **`Thm. 2.4.2`: Bayes’ rule with extra conditioning**
If $P(A\cap E)>0$ and $P(B\cap E)>0$:
$$
P(A\mid B,E)=\frac{P(B\mid A,E)\,P(A\mid E)}{P(B\mid E)}.
$$


## `1 (extended). Conditional probability given partial information (sub-$\sigma$-algebra)`

### What properties should $P(A\mid \mathcal G)$ satisfy?
For fixed $A\in\mathcal F$, the slides define conditional probability given $\mathcal G$
as a **$\mathcal G$-measurable random variable** $Z$ such that:
$$
\int_G Z(\omega)\,dP(\omega)=P(A\cap G)
\quad \forall\,G\in\mathcal G.
$$
- $Z$ must be $\mathcal G$-measurable (computable from the available information),
- the integral identity is the **consistency condition**.

This object is denoted:
$$
P(A\mid \mathcal G):=Z.
$$

### Measurability matters (interpretation)

A $\mathcal G$-measurable random variable is **constant on the atoms of $\mathcal G$**.
So you can’t condition on events “finer” than your information allows.  

### RN construction (why Radon–Nikodym appears)
Fix $A\in\mathcal F$ and define a finite measure on $(\Omega,\mathcal G)$ by:
$$
\nu(G):=P(A\cap G),\quad G\in\mathcal G.
$$
Then:
- $\nu$ is finite,
- and $\nu \ll P|_{\mathcal G}$ (restriction of $P$ to $\mathcal G$).  

By the RN theorem, there exists a $\mathcal G$-measurable $Z$ such that:
$$
\nu(G)=\int_G Z\,dP \quad \forall\,G\in\mathcal G.
$$
We then define:
$$
P(A\mid \mathcal G):=Z
\quad\text{and}\quad
P(A\mid \mathcal G)=\frac{d\nu}{dP|_{\mathcal G}}.
$$




### Indicator / expectation form (slides)
The slides also write:
$$
P(A\mid \mathcal G)=\mathbb E[\mathbf 1_A\mid \mathcal G],
$$
meaning: conditional probability is a special case of conditional expectation.


### Why the RN approach? (slides’ bullet points)
- Ensures $\mathcal G$-measurability + consistency, so averaging recovers unconditional joint probabilities.
- Generalises finite-partition conditioning: if $\mathcal G$ is generated by a finite partition $\{G_i\}$, then $P(A\mid \mathcal G)$ is constant on each atom $G_i$ and matches the usual discrete conditional probabilities.
- Gives a **single object** $P(A\mid \mathcal G)$ defined on $\Omega$, not just a family of numbers $P(A\mid G_i)$ (important in continuous cases).  


## `4. Definition of independence`

### **`Def. 2.5.1`: Independence of two events**
Events $A$ and $B$ are independent if:
$$
P(A\cap B)=P(A)\,P(B).
$$
If $P(A)>0$ and $P(B)>0$, equivalently:
$$
P(A\mid B)=P(A)
\quad\text{and}\quad
P(B\mid A)=P(B).
$$

### Independence ≠ disjointness
If $A$ and $B$ are disjoint, then $P(A\cap B)=0$, so they can be independent only if
$$
P(A)=0 \ \text{or}\ P(B)=0.
$$



### Proposition 2.5.3 (closure under complements)
If $A$ and $B$ are independent, then:
- $A$ and $B^c$ are independent
- $A^c$ and $B$ are independent
- $A^c$ and $B^c$ are independent

### **`Def. 2.5.4`: Independence of 3 events**
$A,B,C$ are independent if:
$$
P(A\cap B)=P(A)P(B),\;
P(A\cap C)=P(A)P(C),\;
P(B\cap C)=P(B)P(C),
$$
and
$$
P(A\cap B\cap C)=P(A)P(B)P(C).
$$
If only the pairwise equalities hold, that is **pairwise independence**.  



### Example 2.5.5 (pairwise independent but not independent)
Two coin tosses:
- $A$ = “1st is H”
- $B$ = “2nd is H”
- $C$ = “both tosses have same result”

Slides show:
$$
P(A\cap B\cap C)=\frac14
\neq
P(A)P(B)P(C)=\frac18.
$$

### **`Def. 2.5.7`: Conditional independence**
$A$ and $B$ are conditionally independent given $E$ if:
$$
P(A\cap B\mid E)=P(A\mid E)\,P(B\mid E).
$$
Slides warn: conditional independence and independence do not imply each other in general.
