# Bayes' Rule


## Motivating Example -- Part 1

Consider again a motivating example from the Introduction: 

**A 40 year-old woman's mammogram comes back positive for a breast cancer tumor. What is the probability that the woman has breast cancer?**

This problem can be modeled as a *stochastic system*:
````{panels}
DEFINITION
^^^
stochastic system
: A stochastic system with one or more inputs and one or more outputs, for which the output(s) is(are) not a deterministic function of the output(s).
````

```{note}
One of the most challenging problems for students learning probability is taking a word problem, translating it into mathematics, identifying and finding the necessary information, and solving the problem.  There are several keys to being successful at this. The first two are:

**1.** Identify all of the phenomena (inputs, outputs, state) that may be random. Note that we should identify phenomenon that may be random **even if we are given a particular outcome, event, or realization occurred for that phenomenon**.
**2.** Introduce clearly-defined mathematical terminology for all of the random phenomena in the problem.
```

In this problem, we use randomness to model our lack of knowledge.  For example, for the woman under consideration, we do not know whether she has breast cancer or not. Thus, whether the woman actually has cancer is one random phenomenon. Although we are told that the mammogram came back positive, in general, it is a random phenomena that will depend on whether the woman being tested has breast cancer. These two are the only random phenomena in this problem, so we can not introduce notation.  Let
* $C$  is the event that a woman is diagnosed with having breast cancer in a given year
* $D$ is the event that a mammogram detects breast cancer in a woman in a given year

```{note}
The next steps in solving a probabiity word problem are:

**3.** Translate the probability that is being asked about from words to mathematics.
**4.** Use probability techniques to formulate a solution for the problem in terms of probabilities that are known or that can be found from reliable source.
```

Again, one of the hard things for people learning probability is to even write down a formula for the probability that the problem is asking about.  For instance, this problem states: "What is the probability that the woman has breast cancer?"  So, some readers may translate this probability to $$P(C)$$. 

**That formula is not the probability that this problem is asking about!** The reason is that we need to interpret that sentence in terms of the other information given in the problem. In this case, the first sentence states "A 40 year-old woman's mammogram comes back positive for a breast cancer tumor". In other words, we are **given** that the mammogram came back positive for cancer, which in our mathematical notation is $D$.  Thus, the probability that this problem is seeking is $P(C|D)$, which is the *conditional* probability that the woman has cancer *given* that her mammogram detected cancer.

To begin to interpret this probability, the reader needs to understand medical tests are never perfect. 
When a mammogram detects breast cancer, that does *not* necessarily mean that a woman has breast cancer. If a mammogram detects breast cancer and a woman does not actually have breast cancer, that is called a *false alarm*. The probability of a false alarm is $P(\overline{C}|D)$. You can find information on false alarm rates for mammograms on the Susan G. Komen Foundation's web pages at 

https://www.komen.org/breast-cancer/screening/mammography/accuracy/

Based on the information on that page, let's let $P(D|\overline{C})=0.1$.

If a woman does have cancer, a mammogram should detect it most of the time. The probability of detecting an effect given it is actually present is called the *sensitivity* of the test. From the same Susan G. Komen Foundation web page, the sensitivity of a mammograms is $P(D|C)=0.87$. 

```{note}
As a reminder, we should not expect $P(D| \overline{C})$ and $P(D|C)$ to add to 1 because they are computed using different conditional probability measures. 
```

If a woman does have cancer, but the mammogram fails to detect it, it is called a *miss*. The probability of miss is $P(\overline{D}|C)=1- P(D|C)=0.13$

The probabilities $P(D|C)$, $P(\overline{D}|C)$, $P(D| \overline{C})$ and $P(\overline{D}|\overline{C})$ represent the probabilities of observing an effect given some true state (or input). These are called *likelihoods*:


````{panels}
DEFINITION
^^^
likelihood (discrete stochastic systems)
: Consider a stochastic system with a discrete set of possible input events $\{ A_0, A_i, \ldots \}$ and a discrete set of possible output events $\{B_0, B_1, \ldots\}$. Then the *likelihoods* are the conditional probabilities of the output events given the input events:
$$
P(B_j| A_i).
$$
````


Now we have a lot of different probabilities, but none of these are of the form $P(C|D)$. Instead, we only have probabilities like $P(D|C)$, $P(D|\overline{C})$, etc. (the likelihoods).  

The probability $P(C|D)$ is the probability of the true state (or input) given the observed effect. It is called an *a posterior* or *posterior* probability, meaning that it is **after the observation or measurement**:

````{panels}
DEFINITION
^^^
*a posteriori* probability (discrete stochastic system)
: Consider a stochastic system with a discrete set of possible input events $\{ A_0, A_i, \ldots \}$ and a discrete set of possible output events $\{B_0, B_1, \ldots\}$. Then the *a posterior probabilities* are the conditional probabilities of the input events given the observed outputs:
$$
P(A_i | B_j).
$$

*A posteriori* probabilities are sometimes abbreviated APPs and sometimes called *posteriors*.
````

We need to develop a new formula to calculate an *a posteriori* probability from the likelihoods (and some additional information).


## Deriving Bayes' Rule

Bayes' Rule is a technique for expressing probabilities of the form $P(X|Y)$ in terms of probabilities of the form $P(Y|X)$ and some additional information. Thus, Bayes' rule can be used to find the *a posteriori probabilities* for our motivating problem. 

We begin by expressing the desired probability using the definition of conditional probability:

$$
P(C|D) = \frac{P (C \cap D)}{P(D)}
$$

We can apply a chain rule to express $P(C \cap D)$ in terms of a likelihood:
$$
P(C|D) = \frac{P (D|C)P(C)}{P(D)}.
$$
Note however, that in addition to the likelihood, we also need $P(C)$. $P(C)$ is the probability of cancer without taking into account the mammogram, and is called an *a priori* probability:

````{panels}
DEFINITION
^^^
*a priori* probability (discrete stochastic system)
: Consider a stochastic system with a discrete set of possible input events $\{ A_0, A_i, \ldots \}$. Then the *a priori probabilities* are the probabilities of the input events before any output is observed or measured:

$$
P(A_i).
$$
*A priori* probabilities are sometimes called *priors* because they are the probabilities of the inputs *prior* to the output being observed or measured.
````

We are still left with $P(D)$, which we don't know. However, we can apply the Law of Total Probability to express $P(D)$ in terms of the likelihoods and *a priori* probabilities. Note that $C$ and $\overline{C}$ form a partition of $S$, so

$$
P(D) = P(D|C) P(C) + P(D| \overline{C}) P(\overline{C}).
$$

Finally, we have our formula for the desired *a posteriori* probability,

$$
P(C|D) = \frac{P (D|C)P(C)}{P(D|C) P(C) + P\left(D| \overline{C}\right) P\left(\overline{C}\right)}.
$$
This last formulas is a form of **Bayes' Rule**:

````{panels}
DEFINITION
^^^
Bayes' Rule (discrete stochastic system)
: Consider a stochastic system with a discrete set of possible input events $\{ A_0, A_i, \ldots \}$ and a discrete set of possible output events $\{B_0, B_1, \ldots\}$.  Then the *a posteriori* probabilities $P(A_i|B_j)$ can be written in terms of the likelihoods ($P(B_j|A_i)$) and the *a priori* probabilities ($P(A_i)$) as 

$$
P(A_i|B_j) = \frac{P\left(B_j \left \vert A_i \right. \right) P\left( A_i \right)}
{ \sum_i P\left(B_j \left \vert A_i \right. \right) P\left( A_i \right)}
$$

Note that the form of the numerator and the summands in the denominator are the same. 
````

** Motivating Example -- Part 2

Now that we have applied Bayes' Rule to our problem, we have a formula for $P(C|D)$ in terms of the likelihoods (which we know) and the *a priori* probabilities (which we do not yet know).

**JMS: WORKING HERE**