# Data Modeling II: Bayesian Statistics

Bayesian statistics systematically combines our prior knowledge about a situation with new data to refine what we believe is true.
In countless real-world scenarios---ranging from medical diagnostics to fundamental physics experiments---information we already have (like the rarity of a disease or theoretical constraints on a physical parameter) can significantly shape how we interpret fresh evidence.
By framing unknowns as probability distributions, Bayesian methods provide a coherent framework for updating those distributions whenever new observations appear, yielding a posterior that reflects all evidence, old and new.
This unifying perspective makes it possible to quantify uncertainties in a transparent way, avoid common logical pitfalls, and naturally propagate errors to any derived quantities of interest.

## Medical Test "Paradox"

The medical test paradox occurs when a diagnostic test is described as highly accurate, yet a person who tests positive for a rare disease ends up with a much lower chance of actually having it.
This seemingly contradiction highlights the importance of prior knowledge or base rates.

Consider a disease that affects only 1% of the population.
Imagine a test that has:
* 99% sensitivity: if you **do** have the disease, it flags you positive 99% of the time.
* 99% specificity: if you **do not** have the disease, it correctly flags you negative 99% of the time.

Many people assume that a "99% accurate" test implies a 99% chance of having the disease if you test positive.
We will see that is not necessarily true when the disease is rare.

### A Simple Counting Argument

Suppose we have 10,000 people.
About 100 of them are diseased (1%).
The remaining 9,900 are healthy.  

Of the 100 diseased people, 99 will test positive (true positives).
Of the 9,900 healthy people, 1% will falsely test positive (99 people).
We end up with a total of 198 positive results: 99 true positives plus 99 false positives.

Hence, only half of these positives (99 out of 198) are truly diseased.
This implies a 50% chance of actually having the disease, which is far lower than 99%.

### Why This Happens

When a condition is rare, most people do not have it.
A small fraction of a large healthy group (the 1% false-positive rate applied to 9,900 healthy people) can match or exceed the positives from the much smaller diseased group.
This is a direct consequence of prior probability: we have to weigh how common the disease is before we interpret a new test result.

## An Intuitive Derivation of Bayes' Theorem

Bayes' Theorem emerges directly from the definition of **conditional probability**.
We start with $P(A \mid B)$, which is read as "the probability of $A$ given that $B$ occurred."
By definition, this is the fraction of times both $A$ and $B$ happen, out of all times $B$ happens:
\begin{align}
P(A \mid B) = \frac{P(A \cap B)}{P(B)}.
\end{align}

Here, $P(A \cap B)$ is the joint probability that both events occur.
We can also express this joint probability in another way:
\begin{align}
P(A \cap B) = P(B \mid A)\,P(A).
\end{align}

Placing this back into our conditional probability formula gives:
\begin{align}
P(A \mid B)
= \frac{P(B \mid A)\,P(A)}{P(B)}.
\end{align}

We can split $B$ into two disjoint groups:
\begin{align}
P(B) = P(B \mid A)\,P(A) \;+\; P(B \mid \bar{A})\,P(\bar{A}).
\end{align}

Putting this altogether yields **Bayes' Theorem**:
\begin{align}
P(A \mid B)
= \frac{P(B \mid A)\,P(A)}
       {P(B \mid A)\,P(A) \;+\; P(B \mid \bar{A})\,P(\bar{A})}.
\end{align}

We can connect each term to our **medical test paradox**. In that story:
* $P(A)$ is the **prevalence** (1%).
* $P(\bar{A})$ is the chance of not having the disease (99%).
* $P(B \mid A)$ is the **sensitivity** (99%).
* $P(B \mid \bar{A})$ is the **false-positive rate** (1%).

When we substitute these numbers, we match the counting argument that led to a final probability of around 50% if you test positive.
This result might seem surprising at first, but it follows naturally once we include both the **base rate** of the disease and the test's **accuracy**.
Bayes' Theorem thus formalizes the intuition behind "counting true positives vs. false positives" and ensures we do not overlook the large fraction of healthy individuals in the population.

This same line of reasoning applies to many physics and data-modeling scenarios.
We often start with a **prior** for a parameter (like the prevalence in the medical example) and then update it with **likelihood** information from new observations.
Bayes' Theorem tells us how to combine both pieces of information in a consistent way, yielding a **posterior probability** that captures our updated understanding of the system.

### Why Bayes' Theorem Matters

The key power of Bayes' Theorem is that it forces us to incorporate the **prior probability** $P(A)$ before we look at new evidence $B$.
Once the data (test results) come in, we use the likelihood $P(B \mid A)$ to update this prior, producing the **posterior probability** $P(A \mid B)$.
In the medical context, the "update" reveals how a single test result against a low prevalence might not be enough for a confident diagnosis.